trouble when creating VM snapshots including memory

hi, i'm having trouble creating VM snapshots that include memory in my oVirt 4.1 test environment. when i do this the VM gets paused and shortly (20-30s) afterwards i'm seeing messages in engine.log about both iSCSI storage domains (master storage domain and data storage where VM resides) experiencing high latency. this quickly worsens from the engines view: VM is unresponsive, Host is unresponsive, engine wants to fence the host (impossible because it's the only host in the test cluster). in the end there is an EngineException EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022) the snapshot fails and is left in an inconsistent state. the situation has to be resolved manually with unlock_entity.sh and maybe lvm commands. this happened twice in exactly the same manner. VM snapshots without memory for this VM are not a problem. VM guest OS is CentOS7 installed from one of the ovirt-image-repository images. it has the oVirt guest agent running. what could be wrong? this is a test environment where lots of parameters aren't optimal but i never had problems like this before, nothing concerning network latency. iSCSI is on a FreeNAS box. CPU, RAM, ethernet (10GBit for storage) on all hosts involved (engine hosted externally, oVirt Node, storage) should be OK by far. it looks like some obvious configuration botch or performance bottleneck to me. can it be linked to the network roles (management and migration network are on a 1 GBit link)? i'm still new to this, not a lot of KVM experience, too. maybe someone recognizes the culprit... thx matthias

On Fri, Jun 9, 2017 at 3:39 PM, Matthias Leopold < matthias.leopold@meduniwien.ac.at> wrote:
hi,
i'm having trouble creating VM snapshots that include memory in my oVirt 4.1 test environment. when i do this the VM gets paused and shortly (20-30s) afterwards i'm seeing messages in engine.log about both iSCSI storage domains (master storage domain and data storage where VM resides) experiencing high latency. this quickly worsens from the engines view: VM is unresponsive, Host is unresponsive, engine wants to fence the host (impossible because it's the only host in the test cluster). in the end there is an EngineException
EngineException: org.ovirt.engine.core.vdsbroke r.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)
the snapshot fails and is left in an inconsistent state. the situation has to be resolved manually with unlock_entity.sh and maybe lvm commands. this happened twice in exactly the same manner. VM snapshots without memory for this VM are not a problem.
VM guest OS is CentOS7 installed from one of the ovirt-image-repository images. it has the oVirt guest agent running.
what could be wrong?
this is a test environment where lots of parameters aren't optimal but i never had problems like this before, nothing concerning network latency. iSCSI is on a FreeNAS box. CPU, RAM, ethernet (10GBit for storage) on all hosts involved (engine hosted externally, oVirt Node, storage) should be OK by far.
Are you sure iSCSI traffic is going over the 10gb interfaces? If it doesn't, it might choke the mgmt interface. Regardless, how is the performance of the storage? I don't expect it to require too much, but saving the memory might require some storage performance. Perhaps there's a bottleneck there? Y.
it looks like some obvious configuration botch or performance bottleneck to me. can it be linked to the network roles (management and migration network are on a 1 GBit link)?
i'm still new to this, not a lot of KVM experience, too. maybe someone recognizes the culprit...
thx matthias _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Am 2017-06-11 um 10:11 schrieb Yaniv Kaul:
On Fri, Jun 9, 2017 at 3:39 PM, Matthias Leopold <matthias.leopold@meduniwien.ac.at <mailto:matthias.leopold@meduniwien.ac.at>> wrote:
hi,
i'm having trouble creating VM snapshots that include memory in my oVirt 4.1 test environment. when i do this the VM gets paused and shortly (20-30s) afterwards i'm seeing messages in engine.log about both iSCSI storage domains (master storage domain and data storage where VM resides) experiencing high latency. this quickly worsens from the engines view: VM is unresponsive, Host is unresponsive, engine wants to fence the host (impossible because it's the only host in the test cluster). in the end there is an EngineException
EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)
the snapshot fails and is left in an inconsistent state. the situation has to be resolved manually with unlock_entity.sh and maybe lvm commands. this happened twice in exactly the same manner. VM snapshots without memory for this VM are not a problem.
VM guest OS is CentOS7 installed from one of the ovirt-image-repository images. it has the oVirt guest agent running.
what could be wrong?
this is a test environment where lots of parameters aren't optimal but i never had problems like this before, nothing concerning network latency. iSCSI is on a FreeNAS box. CPU, RAM, ethernet (10GBit for storage) on all hosts involved (engine hosted externally, oVirt Node, storage) should be OK by far.
Are you sure iSCSI traffic is going over the 10gb interfaces? If it doesn't, it might choke the mgmt interface. Regardless, how is the performance of the storage? I don't expect it to require too much, but saving the memory might require some storage performance. Perhaps there's a bottleneck there? Y.
i shot myself in the foot by also playing around with network QoS and forgetting about it.... no wonder the network chokes when i tell it to do so. without randomly applied QoS profiles snapshots work perfectly ;-) thx matthias
participants (2)
-
Matthias Leopold
-
Yaniv Kaul