
Once upon a time, Nir Soffer <nsoffer@redhat.com> said:
Ovirt is reading 4k from the metadata special volume every 10 secods. If the read takes more than 5 seconds, you will see this warning in engine event log.
Maybe your storage or the host was overloaded at that time (e.g. vm backup)?
I don't see any evidence that the storage was having any problem. The times the message gets logged are not at any high-load times either (either scheduled backups or just high demand). I wrote a perl script to replicate the check, and I ran it on a node in maintenance mode (so no other traffic on the node). My script opens a block device with O_DIRECT, reads the first 4K, and closes it, reporting the time. I do see some latency jumps with that check, but not on the raw block device, just the LV. By that I mean I'm running it on two devices: the multipath device that is the PV and the metadata LV. The multipath device latency is pretty stable, running around 0.3 to 0.5ms. The LV latency is higher (just a little normally) but has a higher variability and spikes to 50-125ms (at the same time that reading the multipath device took under 0.5ms). Seems like this might be a problem somewhere in the Linux logical volume layer, not the block or network layer (or with the network/storage itself). -- Chris Adams <cma@cmadams.net>