AlexThanx Yaniv,I will try to follow your advice.On Tue, Apr 3, 2018 at 8:54 PM, Yaniv Kaul <ykaul@redhat.com> wrote:On Sun, Apr 1, 2018 at 9:04 PM, Alex K <rightkicktech@gmail.com> wrote:Checking further the logs I see this error given from libvirt of the host that has the guest VM running:Immediately after above the engine reports the VM as unresponsive.
Apr 1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.298+0000: 1862: warning : qemuDomainObjBeginJobInternal:3847 : Cannot start job (query, none) for domain Data-Server; current job is (async nested, snapshot) owned by (1863 remoteDispatchDomainSnapshotCreateXML, 1863 remoteDispatchDomainSnapshotCreateXML) for (39s, 41s)
Apr 1 17:53:41 v0 libvirtd: 2018-04-01 17:53:41.299+0000: 1862: error : qemuDomainObjBeginJobInternal:3859 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML)
Apr 1 17:53:57 v0 journal: vdsm Executor WARN Worker blocked: <Worker name=jsonrpc/3 running <Task <JsonRpcTask {'params': {u'frozen': False, u'vmID': u'6bdb3d02-cc33-4019-97cd-7447aecc1e02', u'snapDrives': [{u'baseVolumeID': u'adfabed5-451b-4f46-b22a-45f720b06110', u'domainID': u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID': u'cc0d0772-924c-46db-8ad6-a2b0897c313f', u'imageID': u'7eeadedc-f247-4a31-840d-4de622bf3541'}, {u'baseVolumeID': u'0d960c12-3bcf-4918-896d-bd8e68b5278b', u'domainID': u'2c4b8d45-3d05-4619-9a36-1ecd199d3056', u'volumeID': u'590a6bdd-a9e2-444e-87bc-721c5f8586eb', u'imageID': u'da0e4111-6bbe-43cb-bf59-db5fbf5c3e38'}]}, 'jsonrpc': '2.0', 'method': u'VM.snapshot', 'id': u'be7912e6-ba3d-4357-8ba1-abe40825acf1'} at 0x3bf84d0> timeout=60, duration=60 at 0x3bf8050> task#=416 at 0x20d7f90>If it's reproducible, it'd be great if you could:1. Run libvirt with debug logs2. Get a dump of libvirt when this happens (with gstack for example).3. File a bug on libvirt.Thanks,Y._______________________________________________The SPM host does not log any issues.In the same time, the 3 hosts are fairly idle with only one running guest VM. The gluster traffic is dedicated to a separate Gbit NIC of the servers (dedicated VLAN) while the management network is on a separate network. The gluster traffic does not exceed 40 Mbps during the snapshot operation. Can't understand why libvirt is logging timeout.AlexOn Thu, Mar 29, 2018 at 9:42 PM, Alex K <rightkicktech@gmail.com> wrote:AlexMany thanx for any assistance.Any idea with this issue?I am still trying to understand what may be causing this issue.On Wed, Mar 28, 2018 at 10:06 AM, Yedidyah Bar David <didi@redhat.com> wrote:On Tue, Mar 27, 2018 at 3:38 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:--2018-03-27 14:34 GMT+02:00 Alex K <rightkicktech@gmail.com>:The Window 2016 server VM (which it the one with the relatively big disks: 500 GB) it is consistently rendered unresponsive when trying to get a snapshot.I am using oVirt Guest Tools 4.2-1.el7.centos for the VM.Hi All,Any idea on the below?I amy provide any other additional logs if needed.Adding some people to the threadAdding more people for this part.AlexOn Sun, Mar 25, 2018 at 7:30 PM, Alex K <rightkicktech@gmail.com> wrote:The errors that I managed to collect were:On some large VMs (Windows 2016 with two disk drives, 60GB and 500GB) when attempting to create a snapshot of the VM, the VM becomes unresponsive.Hi folks,I am facing frequently the following issue:
vdsm error at host hosting the VM:
2018-03-25 14:40:13,442+0000 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/7 running <Task <JsonRpcTask {'params': {u'frozen': False, u'vmID': u'a5c761a2-41cd-40c2-b65f-f3819293e8a4', u'snapDrives': [{u'baseVolumeID': u'2a33e585-ece8-4f4d-b45d-5ecc9239200e', u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID': u'e9a01ebd-83dd-40c3-8c83-5302b0d15e04', u'imageID': u'c75b8e93-3067-4472-bf24-dafada224e4d'}, {u'baseVolumeID': u'3fb2278c-1b0d-4677-a529-99084e4b08af', u'domainID': u'888e3aae-f49f-42f7-a7fa-76700befabea', u'volumeID': u'78e6b6b1-2406-4393-8d92-831a6d4f1337', u'imageID': u'd4223744-bf5d-427b-bec2-f14b9bc2ef81'}]}, 'jsonrpc': '2.0', 'method': u'VM.snapshot', 'id': u'89555c87-9701-4260-9952-789965261e65'} at 0x7fca4004cc90> timeout=60, duration=60 at 0x39d8210> task#=155842 at 0x2240e10> (executor:351)
2018-03-25 14:40:15,261+0000 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.01 seconds (__init__:539)
2018-03-25 14:40:17,471+0000 WARN (jsonrpc/5) [virt.vm] (vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4') monitor became unresponsive (command timeout, age=67.9100000001) (vm:5132)
engine.log:
2018-03-25 14:40:19,875Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [1d737df7] EVENT_ID: VM_NOT_RESPONDING(126), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM Data-Server is not responding.
2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM v1.cluster command SnapshotVDS failed: Message timeout which can be caused by communication issues
2018-03-25 14:42:13,708Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] Command 'SnapshotVDSCommand(HostName = v1.cluster, SnapshotVDSCommandParameters:{runAsync='true', hostId='a713d988-ee03-4ff0-a0cd-dc4cde1507f4', vmId='a5c761a2-41cd-40c2-b65f-f3819293e8a4'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues
2018-03-25 14:42:13,708Z WARN [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] Could not perform live snapshot due to error, VM will still be configured to the new created snapshot: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)
2018-03-25 14:42:13,708Z WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa] Host 'v1.cluster' is not responding. It will stay in Connecting state for a grace period of 61 seconds and after that an attempt to fence the host will be issued.
2018-03-25 14:42:13,725Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-15) [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID: VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host v1.cluster is not responding. It will stay in Connecting state for a grace period of 61 seconds and after that an attempt to fence the host will be issued.
2018-03-25 14:42:13,751Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [17789048-009a-454b-b8ad-2c72c7cd37aa] EVENT_ID: USER_CREATE_LIVE_SNAPSHOT_FINISHED_FAILURE(170), Correlation ID: 17789048-009a-454b-b8ad-2c72c7cd37aa, Job ID: 16e48c28-a8c7-4841-bd81-1f2d370f345d, Call Stack: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)
2018-03-25 14:42:14,372Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [] EVENT_ID: USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID: 17789048-009a-454b-b8ad-2c72c7cd37aa, Job ID: 16e48c28-a8c7-4841-bd81-1f2d370f345d, Call Stack: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)
2018-03-25 14:42:14,372Z WARN [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [] Command 'CreateAllSnapshotsFromVm' id: 'bad4f5be-5306-413f-a86a-513b3cfd3c66' end method execution failed, as the command isn't marked for endAction() retries silently ignoring
2018-03-25 14:42:15,951Z WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [5017c163] EVENT_ID: VDS_NO_SELINUX_ENFORCEMENT(25), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Host v1.cluster does not enforce SELinux. Current status: DISABLED
2018-03-25 14:42:15,951Z WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler5) [5017c163] Host 'v1.cluster' is running with SELinux in 'DISABLED' mode
As soon as the VM is unresponsive, the VM console that was already open freezes. I can resume the VM only by powering off and on.
I am using ovirt 4.1.9 with 3 nodes and self-hosted engine. I am running mostly Windows 10 and Windows 2016 server VMs. I have installed latest guest agents from:
http://resources.ovirt.org/pub/ovirt-4.2/iso/oVirt-toolsSetup/4.2-1.el7.centos/At the screen where one takes a snapshot I get a warning saying "Could not detect guest agent on the VM. Note that without guest agent the data on the created snapshot may be inconsistent". See attached. I have verified that ovirt guest tools are installed and shown at installed apps at engine GUI. Also Ovirt Guest Agent (32 bit) and qemu-ga are listed as running at the windows tasks manager. Shouldn't ovirt guest agent be 64 bit on Windows 64 bit?No idea, but I do not think it's related to your problem of freezing while taking a snapshot.This error was already discussed in the past, see e.g.:
http://lists.ovirt.org/pipermail/users/2017-June/082577.htmlBest regards,Didi
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users