
I've been getting an occasional message like: Storage domain hosted_storage experienced a high latency of 5.26121 seconds from host node3. I'm not sure what is causing them though. I look at my storage (EqualLogic iSCSI SAN) and storage network switches and don't see any issues. When the above message was logged, node3 was not hosting the engine (doesn't even have engine HA installed), nor was it the SPM, so why would it have even been accessing the hosted_storage domain? This is with oVirt 4.1. -- Chris Adams <cma@cmadams.net>

On Tue, Apr 11, 2017 at 3:57 PM, Chris Adams <cma@cmadams.net> wrote:
I've been getting an occasional message like:
Storage domain hosted_storage experienced a high latency of 5.26121 seconds from host node3.
I'm not sure what is causing them though. I look at my storage (EqualLogic iSCSI SAN) and storage network switches and don't see any issues.
When the above message was logged, node3 was not hosting the engine (doesn't even have engine HA installed), nor was it the SPM, so why would it have even been accessing the hosted_storage domain?
All hosts are monitoring their access to all storage domains in the data center. Y.
This is with oVirt 4.1. -- Chris Adams <cma@cmadams.net> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Once upon a time, Yaniv Kaul <ykaul@redhat.com> said:
On Tue, Apr 11, 2017 at 3:57 PM, Chris Adams <cma@cmadams.net> wrote:
I've been getting an occasional message like:
Storage domain hosted_storage experienced a high latency of 5.26121 seconds from host node3.
I'm not sure what is causing them though. I look at my storage (EqualLogic iSCSI SAN) and storage network switches and don't see any issues.
When the above message was logged, node3 was not hosting the engine (doesn't even have engine HA installed), nor was it the SPM, so why would it have even been accessing the hosted_storage domain?
All hosts are monitoring their access to all storage domains in the data center.
Okay. Is there any more information about what this message actually means though? Is it read latency, write latency, a particular VM, etc.? I can't find any issue at the network or SAN level, nor any load events that correlate with the times oVirt logs the latency messages. -- Chris Adams <cma@cmadams.net>

בתאריך יום ה׳, 13 באפר׳ 2017, 16:03, מאת Chris Adams <cma@cmadams.net>:
Once upon a time, Yaniv Kaul <ykaul@redhat.com> said:
On Tue, Apr 11, 2017 at 3:57 PM, Chris Adams <cma@cmadams.net> wrote:
I've been getting an occasional message like:
Storage domain hosted_storage experienced a high latency of 5.26121 seconds from host node3.
I'm not sure what is causing them though. I look at my storage (EqualLogic iSCSI SAN) and storage network switches and don't see any issues.
When the above message was logged, node3 was not hosting the engine (doesn't even have engine HA installed), nor was it the SPM, so why would it have even been accessing the hosted_storage domain?
All hosts are monitoring their access to all storage domains in the data center.
Okay. Is there any more information about what this message actually means though? Is it read latency, write latency, a particular VM, etc.?
I can't find any issue at the network or SAN level, nor any load events that correlate with the times oVirt logs the latency messages.
Ovirt is reading 4k from the metadata special volume every 10 secods. If the read takes more than 5 seconds, you will see this warning in engine event log. Maybe your storage or the host was overloaded at that time (e.g. vm backup)? Nir --
Chris Adams <cma@cmadams.net> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Once upon a time, Nir Soffer <nsoffer@redhat.com> said:
Ovirt is reading 4k from the metadata special volume every 10 secods. If the read takes more than 5 seconds, you will see this warning in engine event log.
Maybe your storage or the host was overloaded at that time (e.g. vm backup)?
I don't see any evidence that the storage was having any problem. The times the message gets logged are not at any high-load times either (either scheduled backups or just high demand). I wrote a perl script to replicate the check, and I ran it on a node in maintenance mode (so no other traffic on the node). My script opens a block device with O_DIRECT, reads the first 4K, and closes it, reporting the time. I do see some latency jumps with that check, but not on the raw block device, just the LV. By that I mean I'm running it on two devices: the multipath device that is the PV and the metadata LV. The multipath device latency is pretty stable, running around 0.3 to 0.5ms. The LV latency is higher (just a little normally) but has a higher variability and spikes to 50-125ms (at the same time that reading the multipath device took under 0.5ms). Seems like this might be a problem somewhere in the Linux logical volume layer, not the block or network layer (or with the network/storage itself). -- Chris Adams <cma@cmadams.net>
participants (3)
-
Chris Adams
-
Nir Soffer
-
Yaniv Kaul