<div dir="ltr"><div><div><div>Checking further the logs I see this error given from libvirt of the host that has the guest VM running: <br><span style="font-family:monospace,monospace"><br>Apr 1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.298+0000: 1862: warning : qemuDomainObjBeginJobInternal:3847 : Cannot start job (query, none) for domain Data-Server; current job is (async nested, snapshot) owned by (1863 remoteDispatchDomainSnapshotCreateXML, 1863 remoteDispatchDomainSnapshotCreateXML) for (39s, 41s)<br>Apr 1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.299+0000: 1862: error : qemuDomainObjBeginJobInternal:3859 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML)<br>Apr 1 17:53:57 v0 journal: vdsm Executor WARN Worker blocked: <Worker name=jsonrpc/3 running <Task <JsonRpcTask {'params': {u'frozen': False, u'vmID': u'6bdb3d02-cc33-4019-97cd-7447aecc1e02', u'snapDrives': [{u'baseVolumeID': u'adfabed5-451b-4f46-b22a-45f720b06110', u'domainID': u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID': u'cc0d0772-924c-46db-8ad6-a2b0897c313f', u'imageID': u'7eeadedc-f247-4a31-840d-4de622bf3541'}, {u'baseVolumeID': u'0d960c12-3bcf-4918-896d-bd8e68b5278b', u'domainID': u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID': u'590a6bdd-a9e2-444e-87bc-721c5f8586eb', u'imageID': u'da0e4111-6bbe-43cb-bf59-db5fbf5c3e38'}]}, 'jsonrpc': '2.0', 'method': u'VM.snapshot', 'id': u'be7912e6-ba3d-4357-8ba1-abe40825acf1'} at 0x3bf84d0> timeout=60, duration=60 at 0x3bf8050> task#=416 at 0x20d7f90></span><br><br><br></div>Immediately after above the engine reports the VM as unresponsive. <br></div>The SPM host does not log any issues. <br><br></div><div>In the same time, the 3 hosts are fairly idle with only one running guest VM. The gluster traffic is dedicated to a separate Gbit NIC of the servers (dedicated VLAN) while the management network is on a separate network. The gluster traffic does not exceed 40 Mbps during the snapshot operation. Can't understand why libvirt is logging timeout. <br></div><div><br></div>Alex<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 29, 2018 at 9:42 PM, Alex K <span dir="ltr"><<a href="mailto:rightkicktech@gmail.com" target="_blank">rightkicktech@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Any idea with this issue?<br></div>I am still trying to understand what may be causing this issue.<br><br></div>Many thanx for any assistance. <br><span class="HOEnZb"><font color="#888888"><br></font></span></div><span class="HOEnZb"><font color="#888888">Alex<br></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David <span dir="ltr"><<a href="mailto:didi@redhat.com" target="_blank">didi@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola <span dir="ltr"><<a href="mailto:sbonazzo@redhat.com" target="_blank">sbonazzo@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="m_-6967011413672980483m_-3684008751660998991gmail-">2018-03-27 14:34 GMT+02:00 Alex K <span dir="ltr"><<a href="mailto:rightkicktech@gmail.com" target="_blank">rightkicktech@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div><div>Hi All, <br><br></div>Any idea on the below?<br><br></div>I am using oVirt Guest Tools 4.2-1.el7.centos for the VM. <br></div>The Window 2016 server VM (which it the one with the relatively big disks: 500 GB) it is consistently rendered unresponsive when trying to get a snapshot. <br></div><div>I amy provide any other additional logs if needed. <br></div></div></blockquote><div><br></div></span><div>Adding some people to the thread</div></div></div></div></blockquote><div><br></div></span><div>Adding more people for this part.<br></div><div><div class="m_-6967011413672980483h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="m_-6967011413672980483m_-3684008751660998991gmail-h5"><div dir="ltr"><div></div><span class="m_-6967011413672980483m_-3684008751660998991gmail-m_8584560152889258411HOEnZb"><font color="#888888"><div><br></div>Alex<br></font></span></div><div class="m_-6967011413672980483m_-3684008751660998991gmail-m_8584560152889258411HOEnZb"><div class="m_-6967011413672980483m_-3684008751660998991gmail-m_8584560152889258411h5"><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 25, 2018 at 7:30 PM, Alex K <span dir="ltr"><<a href="mailto:rightkicktech@gmail.com" target="_blank">rightkicktech@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div><div><div>Hi folks, <br><br></div>I am facing frequently the following issue: <br><br></div>On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB) when attempting to create a snapshot of the VM, the VM becomes unresponsive. <br><br></div>The errors that I managed to collect were: <br><br>vdsm error at host hosting the VM: <br><span style="font-family:monospace,monospace">2018-03-25 14:40:13,442+0000 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/7 running <Task <JsonRpcTask {'params': {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f381<wbr>9293e8a4', u'snapDrives': [{u'baseVolumeID': u'2a33e585-ece8-4f4d-b45d-5ecc<wbr>9239200e', u'domainID': u'888e3aae-f49f-42f7-a7fa-7670<wbr>0befabea', u'volumeID': u'e9a01ebd-83dd-40c3-8c83-5302<wbr>b0d15e04', u'imageID': u'c75b8e93-3067-4472-bf24-dafa<wbr>da224e4d'}, {u'baseVolumeID': u'3fb2278c-1b0d-4677-a529-9908<wbr>4e4b08af', u'domainID': u'888e3aae-f49f-42f7-a7fa-7670<wbr>0befabea', u'volumeID': u'78e6b6b1-2406-4393-8d92-831a<wbr>6d4f1337', u'imageID': u'd4223744-bf5d-427b-bec2-f14b<wbr>9bc2ef81'}]}, 'jsonrpc': '2.0', 'method': u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-7899<wbr>65261e65'} at 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at 0x2240e10> (executor:351)<br>2018-03-25 14:40:15,261+0000 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)<br>2018-03-25 14:40:17,471+0000 WARN (jsonrpc/5) [virt.vm] (vmId='a5c761a2-41cd-40c2-b65f<wbr>-f3819293e8a4') monitor became unresponsive (command timeout, age=67.9100000001) (vm:5132)</span><br><br>engine.log: <br><span style="font-family:monospace,monospace">2018-03-25 14:40:19,875Z WARN [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gDirector] (DefaultQuartzScheduler2) [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM Data-Server is not responding.<br><br>2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gDirector] (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c<wbr>7cd37aa] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,<wbr>802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster command SnapshotVDS failed: Message timeout which can be caused by communication issues<br>2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.vdsbrok<wbr>er.vdsbroker.SnapshotVDSComman<wbr>d] (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c<wbr>7cd37aa] Command 'SnapshotVDSCommand(HostName = v1.cluster, SnapshotVDSCommandParameters:{<wbr>runAsync='true', hostId='a713d988-ee03-4ff0-a0c<wbr>d-dc4cde1507f4', vmId='a5c761a2-41cd-40c2-b65f-<wbr>f3819293e8a4'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues<br>2018-03-25 14:42:13,708Z WARN [org.ovirt.engine.core.bll.sna<wbr>pshots.CreateAllSnapshotsFromV<wbr>mCommand] (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c<wbr>7cd37aa] Could not perform live snapshot due to error, VM will still be configured to the new created snapshot: EngineException: org.ovirt.engine.core.vdsbroke<wbr>r.vdsbroker.VDSNetworkExceptio<wbr>n: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)<br>2018-03-25 14:42:13,708Z WARN [org.ovirt.engine.core.vdsbrok<wbr>er.VdsManager] (org.ovirt.thread.pool-6-threa<wbr>d-15) [17789048-009a-454b-b8ad-2c72c<wbr>7cd37aa] Host 'v1.cluster' is not responding. It will stay in Connecting state for a grace period of 61 seconds and after that an attempt to fence the host will be issued.<br>2018-03-25 14:42:13,725Z WARN [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gDirector] (org.ovirt.thread.pool-6-threa<wbr>d-15) [17789048-009a-454b-b8ad-2c72c<wbr>7cd37aa] EVENT_ID: VDS_HOST_NOT_RESPONDING_CONNEC<wbr>TING(9,008), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host v1.cluster is not responding. It will stay in Connecting state for a grace period of 61 seconds and after that an attempt to fence the host will be issued.<br>2018-03-25 14:42:13,751Z WARN [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gDirector] (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c<wbr>7cd37aa] EVENT_ID: USER_CREATE_LIVE_SNAPSHOT_FINI<wbr>SHED_FAILURE(170), Correlation ID: 17789048-009a-454b-b8ad-2c72c7<wbr>cd37aa, Job ID: 16e48c28-a8c7-4841-bd81-1f2d37<wbr>0f345d, Call Stack: org.ovirt.engine.core.common.e<wbr>rrors.EngineException: EngineException: org.ovirt.engine.core.vdsbroke<wbr>r.vdsbroker.VDSNetworkExceptio<wbr>n: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)<br>2018-03-25 14:42:14,372Z ERROR [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gDirector] (DefaultQuartzScheduler5) [] EVENT_ID: USER_CREATE_SNAPSHOT_FINISHED_<wbr>FAILURE(69), Correlation ID: 17789048-009a-454b-b8ad-2c72c7<wbr>cd37aa, Job ID: 16e48c28-a8c7-4841-bd81-1f2d37<wbr>0f345d, Call Stack: org.ovirt.engine.core.common.e<wbr>rrors.EngineException: EngineException: org.ovirt.engine.core.vdsbroke<wbr>r.vdsbroker.VDSNetworkExceptio<wbr>n: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)<br>2018-03-25 14:42:14,372Z WARN [org.ovirt.engine.core.bll.Con<wbr>currentChildCommandsExecutionC<wbr>allback] (DefaultQuartzScheduler5) [] Command 'CreateAllSnapshotsFromVm' id: 'bad4f5be-5306-413f-a86a-513b3<wbr>cfd3c66' end method execution failed, as the command isn't marked for endAction() retries silently ignoring<br>2018-03-25 14:42:15,951Z WARN [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gDirector] (DefaultQuartzScheduler5) [5017c163] EVENT_ID: VDS_NO_SELINUX_ENFORCEMENT(25)<wbr>, Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host v1.cluster does not enforce SELinux. Current status: DISABLED<br>2018-03-25 14:42:15,951Z WARN [org.ovirt.engine.core.vdsbrok<wbr>er.VdsManager] (DefaultQuartzScheduler5) [5017c163] Host 'v1.cluster' is running with SELinux in 'DISABLED' mode<br></span><br>As soon as the VM is unresponsive, the VM console that was already open freezes. I can resume the VM only by powering off and on. <br></div><br>I am using ovirt 4.1.9 with 3 nodes and self-hosted engine. I am running mostly Windows 10 and Windows 2016 server VMs. I have installed latest guest agents from: <br><br><a href="http://resources.ovirt.org/pub/ovirt-4.2/iso/oVirt-toolsSetup/4.2-1.el7.centos/" target="_blank">http://resources.ovirt.org/pub<wbr>/ovirt-4.2/iso/oVirt-toolsSetu<wbr>p/4.2-1.el7.centos/</a><br><br></div><div>At the screen where one takes a snapshot I get a warning saying "Could not detect guest agent on the VM. Note that without guest agent the data on the created snapshot may be inconsistent". See attached. I have verified that ovirt guest tools are installed and shown at installed apps at engine GUI. Also Ovirt Guest Agent (32 bit) and qemu-ga are listed as running at the windows tasks manager. Shouldn't ovirt guest agent be 64 bit on Windows 64 bit?<br></div></div></blockquote></div></div></div></div></div></div></blockquote></div></div></div></blockquote><div><br></div></div></div><div>No idea, but I do not think it's related to your problem of freezing while taking a snapshot.<br><br></div><div>This error was already discussed in the past, see e.g.:<br><br><a href="http://lists.ovirt.org/pipermail/users/2017-June/082577.html" target="_blank">http://lists.ovirt.org/piperma<wbr>il/users/2017-June/082577.html</a><br><br></div><div>Best regards,<span class="m_-6967011413672980483HOEnZb"><font color="#888888"><br></font></span></div></div><span class="m_-6967011413672980483HOEnZb"><font color="#888888">-- <br><div class="m_-6967011413672980483m_-3684008751660998991gmail_signature">Didi<br></div>
</font></span></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>