Once upon a time, Nir Soffer <nsoffer(a)redhat.com> said:
Ovirt is reading 4k from the metadata special volume every 10 secods.
If
the read takes more than 5 seconds, you will see this warning in engine
event log.
Maybe your storage or the host was overloaded at that time (e.g. vm backup)?
I don't see any evidence that the storage was having any problem. The
times the message gets logged are not at any high-load times either
(either scheduled backups or just high demand).
I wrote a perl script to replicate the check, and I ran it on a node in
maintenance mode (so no other traffic on the node). My script opens a
block device with O_DIRECT, reads the first 4K, and closes it, reporting
the time. I do see some latency jumps with that check, but not on the
raw block device, just the LV.
By that I mean I'm running it on two devices: the multipath device that
is the PV and the metadata LV. The multipath device latency is pretty
stable, running around 0.3 to 0.5ms. The LV latency is higher (just a
little normally) but has a higher variability and spikes to 50-125ms (at
the same time that reading the multipath device took under 0.5ms).
Seems like this might be a problem somewhere in the Linux logical volume
layer, not the block or network layer (or with the network/storage
itself).
--
Chris Adams <cma(a)cmadams.net>