Am 2017-06-11 um 10:11 schrieb Yaniv Kaul:
On Fri, Jun 9, 2017 at 3:39 PM, Matthias Leopold
<matthias.leopold(a)meduniwien.ac.at
<mailto:matthias.leopold@meduniwien.ac.at>> wrote:
hi,
i'm having trouble creating VM snapshots that include memory in my
oVirt 4.1 test environment. when i do this the VM gets paused and
shortly (20-30s) afterwards i'm seeing messages in engine.log about
both iSCSI storage domains (master storage domain and data storage
where VM resides) experiencing high latency. this quickly worsens
from the engines view: VM is unresponsive, Host is unresponsive,
engine wants to fence the host (impossible because it's the only
host in the test cluster). in the end there is an EngineException
EngineException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can
be caused by communication issues (Failed with error
VDS_NETWORK_ERROR and code 5022)
the snapshot fails and is left in an inconsistent state. the
situation has to be resolved manually with unlock_entity.sh and
maybe lvm commands. this happened twice in exactly the same manner.
VM snapshots without memory for this VM are not a problem.
VM guest OS is CentOS7 installed from one of the
ovirt-image-repository images. it has the oVirt guest agent running.
what could be wrong?
this is a test environment where lots of parameters aren't optimal
but i never had problems like this before, nothing concerning
network latency. iSCSI is on a FreeNAS box. CPU, RAM, ethernet
(10GBit for storage) on all hosts involved (engine hosted
externally, oVirt Node, storage) should be OK by far.
Are you sure iSCSI traffic is going over the 10gb interfaces?
If it doesn't, it might choke the mgmt interface.
Regardless, how is the performance of the storage? I don't expect it to
require too much, but saving the memory might require some storage
performance. Perhaps there's a bottleneck there?
Y.
i shot myself in the foot by also playing around with network QoS and
forgetting about it.... no wonder the network chokes when i tell it to
do so. without randomly applied QoS profiles snapshots work perfectly ;-)
thx
matthias