Cinder Snapshot Issues

Hello, I'm running into a problem with live snapshots not working when using cinder/ceph disks. There are different failures for including and not including memory, but in each case cinder/ceph creates a new snapshot that can be seen in cinder and ceph. When doing a memory/disk snapshot the VM ends up in a paused state and I need to kill -9 the qemu process to be able to boot the vm again. The engine seems to be losing connection with the vdsm process on the VM host after freezing the guest's filesystems. The guest never receives the thaw command and it fails in the logs. I am pasting in some log snippets. 2016-04-12 19:24:58,851 INFO [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-8-thread-27) [5c4493e] Ending command 'org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand' successfully. 2016-04-12 19:27:56,873 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM OVCL1A command failed: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,873 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]]' 2016-04-12 19:27:56,874 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] HostName = OVCL1A 2016-04-12 19:27:56,874 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'SnapshotVDSCommand(HostName = OVCL1A, SnapshotVDSCommandParameters:{runAsync='true', hostId='9bdfaedc-34a8-4a08-ad8a-c117835a6094', vmId='040609f6-cfe0-4763-8b32-08ffad158c93'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,875 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-16) [4d97ca06] Host 'OVCL1A' is not responding. Disk only live snapshots freeze the guest file systems, the vm receives the thaw command, but the VM is no longer responsive. The VM pings on the network but it is hung and it also needs a kill -9 to the qemu process so that it can be booted again. jsonrpc.Executor/0::DEBUG::2016-04-12 19:41:58,342::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.snapshot' in bridge with {u'frozen': True, u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93', u'snapDrives': []} jsonrpc.Executor/0::INFO::2016-04-12 19:41:58,343::vm::3237::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::<domainsnapshot> <disks/> </domainsnapshot> jsonrpc.Executor/0::ERROR::2016-04-12 19:41:58,346::vm::3252::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3250, in snapshot self._dom.snapshotCreateXML(snapxml, snapFlags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2581, in snapshotCreateXML if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self) libvirtError: unsupported configuration: nothing selected for snapshot jsonrpc.Executor/7::DEBUG::2016-04-12 19:41:58,391::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.thaw' in bridge with {u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93'} jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,391::vm::3041::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Thawing guest filesystems jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,396::vm::3056::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::6 guest filesystems thawed Everything else is working well with cinder for running VMs (making disks, running VMs, live migration, etc...). I was able to get live snapshots when using a CephFS Posix storage domain. Versions.. Ceph 9.2.0 oVirt Latest CentOS 7.2 Cinder 7.0.1-1.el7 Any help would be appreciated. Thanks, Kevin

On Tue, Apr 12, 2016 at 11:13 PM, Kevin Hrpcek <khrpcek@gmail.com> wrote:
Hello,
I'm running into a problem with live snapshots not working when using cinder/ceph disks. There are different failures for including and not including memory, but in each case cinder/ceph creates a new snapshot that can be seen in cinder and ceph. When doing a memory/disk snapshot the VM ends up in a paused state and I need to kill -9 the qemu process to be able to boot the vm again. The engine seems to be losing connection with the vdsm process on the VM host after freezing the guest's filesystems. The guest never receives the thaw command and it fails in the logs. I am pasting in some log snippets.
2016-04-12 19:24:58,851 INFO [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-8-thread-27) [5c4493e] Ending command 'org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand' successfully. 2016-04-12 19:27:56,873 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM OVCL1A command failed: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,873 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]]' 2016-04-12 19:27:56,874 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] HostName = OVCL1A 2016-04-12 19:27:56,874 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'SnapshotVDSCommand(HostName = OVCL1A, SnapshotVDSCommandParameters:{runAsync='true', hostId='9bdfaedc-34a8-4a08-ad8a-c117835a6094', vmId='040609f6-cfe0-4763-8b32-08ffad158c93'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,875 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-16) [4d97ca06] Host 'OVCL1A' is not responding.
Disk only live snapshots freeze the guest file systems, the vm receives the thaw command, but the VM is no longer responsive. The VM pings on the network but it is hung and it also needs a kill -9 to the qemu process so that it can be booted again.
jsonrpc.Executor/0::DEBUG::2016-04-12 19:41:58,342::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.snapshot' in bridge with {u'frozen': True, u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93', u'snapDrives': []} jsonrpc.Executor/0::INFO::2016-04-12 19:41:58,343::vm::3237::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::<domainsnapshot> <disks/> </domainsnapshot>
jsonrpc.Executor/0::ERROR::2016-04-12 19:41:58,346::vm::3252::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3250, in snapshot self._dom.snapshotCreateXML(snapxml, snapFlags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2581, in snapshotCreateXML if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self) libvirtError: unsupported configuration: nothing selected for snapshot jsonrpc.Executor/7::DEBUG::2016-04-12 19:41:58,391::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.thaw' in bridge with {u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93'} jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,391::vm::3041::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Thawing guest filesystems jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,396::vm::3056::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::6 guest filesystems thawed
It could be an issue of a guest agent. Please make sure the ovirt-guest-agent and qemu-guest-agent are installed and running in the VM. Further details are available at: http://www.ovirt.org/documentation/internal/guest-agent/understanding-guest-... In addition, can you please attach full engine/vdsm logs.
Everything else is working well with cinder for running VMs (making disks, running VMs, live migration, etc...). I was able to get live snapshots when using a CephFS Posix storage domain.
Versions.. Ceph 9.2.0 oVirt Latest CentOS 7.2 Cinder 7.0.1-1.el7
Any help would be appreciated.
Thanks, Kevin
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

The guest-agent processes are running. I recreated the error today doing a live snapshot with memory. The snapshot was initiated at 13:05:42,854, VM name is 'ov1', snapshot name is prodsnap10. The engine log goes back far enough that you can see previous failures as well, but the vdsm log is only for today. Kevin On Mon, May 23, 2016 at 3:31 AM, Daniel Erez <derez@redhat.com> wrote:
On Tue, Apr 12, 2016 at 11:13 PM, Kevin Hrpcek <khrpcek@gmail.com> wrote:
Hello,
I'm running into a problem with live snapshots not working when using cinder/ceph disks. There are different failures for including and not including memory, but in each case cinder/ceph creates a new snapshot that can be seen in cinder and ceph. When doing a memory/disk snapshot the VM ends up in a paused state and I need to kill -9 the qemu process to be able to boot the vm again. The engine seems to be losing connection with the vdsm process on the VM host after freezing the guest's filesystems. The guest never receives the thaw command and it fails in the logs. I am pasting in some log snippets.
2016-04-12 19:24:58,851 INFO [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-8-thread-27) [5c4493e] Ending command 'org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand' successfully. 2016-04-12 19:27:56,873 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM OVCL1A command failed: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,873 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]]' 2016-04-12 19:27:56,874 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] HostName = OVCL1A 2016-04-12 19:27:56,874 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'SnapshotVDSCommand(HostName = OVCL1A, SnapshotVDSCommandParameters:{runAsync='true', hostId='9bdfaedc-34a8-4a08-ad8a-c117835a6094', vmId='040609f6-cfe0-4763-8b32-08ffad158c93'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,875 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-16) [4d97ca06] Host 'OVCL1A' is not responding.
Disk only live snapshots freeze the guest file systems, the vm receives the thaw command, but the VM is no longer responsive. The VM pings on the network but it is hung and it also needs a kill -9 to the qemu process so that it can be booted again.
jsonrpc.Executor/0::DEBUG::2016-04-12 19:41:58,342::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.snapshot' in bridge with {u'frozen': True, u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93', u'snapDrives': []} jsonrpc.Executor/0::INFO::2016-04-12 19:41:58,343::vm::3237::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::<domainsnapshot> <disks/> </domainsnapshot>
jsonrpc.Executor/0::ERROR::2016-04-12 19:41:58,346::vm::3252::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3250, in snapshot self._dom.snapshotCreateXML(snapxml, snapFlags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2581, in snapshotCreateXML if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self) libvirtError: unsupported configuration: nothing selected for snapshot jsonrpc.Executor/7::DEBUG::2016-04-12 19:41:58,391::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.thaw' in bridge with {u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93'} jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,391::vm::3041::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Thawing guest filesystems jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,396::vm::3056::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::6 guest filesystems thawed
It could be an issue of a guest agent. Please make sure the ovirt-guest-agent and qemu-guest-agent are installed and running in the VM. Further details are available at: http://www.ovirt.org/documentation/internal/guest-agent/understanding-guest-... In addition, can you please attach full engine/vdsm logs.
Everything else is working well with cinder for running VMs (making disks, running VMs, live migration, etc...). I was able to get live snapshots when using a CephFS Posix storage domain.
Versions.. Ceph 9.2.0 oVirt Latest CentOS 7.2 Cinder 7.0.1-1.el7
Any help would be appreciated.
Thanks, Kevin
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

According to VDSM log [1], there was a timeout error during snapshot operation. This could be a duplicate of bugs [2] already resolved in latest version. Can you please provide the versions of the following components for further investigation: engine / vdsm / qemu-kvm-rhev / libvirt. Also, please attach libivrt/qemu logs. [1] jsonrpc.Executor/6::ERROR::2016-05-23 13:09:46,790::vm::3311::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3309, in snapshot self._dom.snapshotCreateXML(snapxml, snapFlags) File "/usr/share/vdsm/virt/virdomain.py", line 76, in f raise toe TimeoutError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML) [2] https://bugzilla.redhat.com/show_bug.cgi?id=1261980 https://bugzilla.redhat.com/show_bug.cgi?id=1250839 On Mon, May 23, 2016 at 11:31 AM, Daniel Erez <derez@redhat.com> wrote:
On Tue, Apr 12, 2016 at 11:13 PM, Kevin Hrpcek <khrpcek@gmail.com> wrote:
Hello,
I'm running into a problem with live snapshots not working when using cinder/ceph disks. There are different failures for including and not including memory, but in each case cinder/ceph creates a new snapshot that can be seen in cinder and ceph. When doing a memory/disk snapshot the VM ends up in a paused state and I need to kill -9 the qemu process to be able to boot the vm again. The engine seems to be losing connection with the vdsm process on the VM host after freezing the guest's filesystems. The guest never receives the thaw command and it fails in the logs. I am pasting in some log snippets.
2016-04-12 19:24:58,851 INFO [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-8-thread-27) [5c4493e] Ending command 'org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand' successfully. 2016-04-12 19:27:56,873 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM OVCL1A command failed: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,873 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]]' 2016-04-12 19:27:56,874 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] HostName = OVCL1A 2016-04-12 19:27:56,874 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'SnapshotVDSCommand(HostName = OVCL1A, SnapshotVDSCommandParameters:{runAsync='true', hostId='9bdfaedc-34a8-4a08-ad8a-c117835a6094', vmId='040609f6-cfe0-4763-8b32-08ffad158c93'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,875 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-16) [4d97ca06] Host 'OVCL1A' is not responding.
Disk only live snapshots freeze the guest file systems, the vm receives the thaw command, but the VM is no longer responsive. The VM pings on the network but it is hung and it also needs a kill -9 to the qemu process so that it can be booted again.
jsonrpc.Executor/0::DEBUG::2016-04-12 19:41:58,342::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.snapshot' in bridge with {u'frozen': True, u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93', u'snapDrives': []} jsonrpc.Executor/0::INFO::2016-04-12 19:41:58,343::vm::3237::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::<domainsnapshot> <disks/> </domainsnapshot>
jsonrpc.Executor/0::ERROR::2016-04-12 19:41:58,346::vm::3252::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3250, in snapshot self._dom.snapshotCreateXML(snapxml, snapFlags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2581, in snapshotCreateXML if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self) libvirtError: unsupported configuration: nothing selected for snapshot jsonrpc.Executor/7::DEBUG::2016-04-12 19:41:58,391::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.thaw' in bridge with {u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93'} jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,391::vm::3041::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Thawing guest filesystems jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,396::vm::3056::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::6 guest filesystems thawed
It could be an issue of a guest agent. Please make sure the ovirt-guest-agent and qemu-guest-agent are installed and running in the VM. Further details are available at: http://www.ovirt.org/documentation/internal/guest-agent/understanding-guest-... In addition, can you please attach full engine/vdsm logs.
Everything else is working well with cinder for running VMs (making disks, running VMs, live migration, etc...). I was able to get live snapshots when using a CephFS Posix storage domain.
Versions.. Ceph 9.2.0 oVirt Latest CentOS 7.2 Cinder 7.0.1-1.el7
Any help would be appreciated.
Thanks, Kevin
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Qemu log for a cinder based VM is attached. Current versions: ovirt-engine-3.6.5.3-1.el7.centos.noarch vdsm-4.17.26-0.el7.centos.noarch qemu-kvm-ev-2.3.0-31.el7_2.10.1.x86_64 libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64 libvirt-daemon-1.2.17-13.el7_2.4.x86_64 On Wed, May 25, 2016 at 7:59 AM, Daniel Erez <derez@redhat.com> wrote:
According to VDSM log [1], there was a timeout error during snapshot operation. This could be a duplicate of bugs [2] already resolved in latest version. Can you please provide the versions of the following components for further investigation: engine / vdsm / qemu-kvm-rhev / libvirt. Also, please attach libivrt/qemu logs.
[1] jsonrpc.Executor/6::ERROR::2016-05-23 13:09:46,790::vm::3311::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3309, in snapshot self._dom.snapshotCreateXML(snapxml, snapFlags) File "/usr/share/vdsm/virt/virdomain.py", line 76, in f raise toe TimeoutError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainSnapshotCreateXML)
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1261980 https://bugzilla.redhat.com/show_bug.cgi?id=1250839
On Mon, May 23, 2016 at 11:31 AM, Daniel Erez <derez@redhat.com> wrote:
On Tue, Apr 12, 2016 at 11:13 PM, Kevin Hrpcek <khrpcek@gmail.com> wrote:
Hello,
I'm running into a problem with live snapshots not working when using cinder/ceph disks. There are different failures for including and not including memory, but in each case cinder/ceph creates a new snapshot that can be seen in cinder and ceph. When doing a memory/disk snapshot the VM ends up in a paused state and I need to kill -9 the qemu process to be able to boot the vm again. The engine seems to be losing connection with the vdsm process on the VM host after freezing the guest's filesystems. The guest never receives the thaw command and it fails in the logs. I am pasting in some log snippets.
2016-04-12 19:24:58,851 INFO [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-8-thread-27) [5c4493e] Ending command 'org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand' successfully. 2016-04-12 19:27:56,873 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM OVCL1A command failed: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,873 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]]' 2016-04-12 19:27:56,874 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] HostName = OVCL1A 2016-04-12 19:27:56,874 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'SnapshotVDSCommand(HostName = OVCL1A, SnapshotVDSCommandParameters:{runAsync='true', hostId='9bdfaedc-34a8-4a08-ad8a-c117835a6094', vmId='040609f6-cfe0-4763-8b32-08ffad158c93'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2016-04-12 19:27:56,875 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-16) [4d97ca06] Host 'OVCL1A' is not responding.
Disk only live snapshots freeze the guest file systems, the vm receives the thaw command, but the VM is no longer responsive. The VM pings on the network but it is hung and it also needs a kill -9 to the qemu process so that it can be booted again.
jsonrpc.Executor/0::DEBUG::2016-04-12 19:41:58,342::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.snapshot' in bridge with {u'frozen': True, u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93', u'snapDrives': []} jsonrpc.Executor/0::INFO::2016-04-12 19:41:58,343::vm::3237::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::<domainsnapshot> <disks/> </domainsnapshot>
jsonrpc.Executor/0::ERROR::2016-04-12 19:41:58,346::vm::3252::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3250, in snapshot self._dom.snapshotCreateXML(snapxml, snapFlags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2581, in snapshotCreateXML if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self) libvirtError: unsupported configuration: nothing selected for snapshot jsonrpc.Executor/7::DEBUG::2016-04-12 19:41:58,391::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.thaw' in bridge with {u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93'} jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,391::vm::3041::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Thawing guest filesystems jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,396::vm::3056::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::6 guest filesystems thawed
It could be an issue of a guest agent. Please make sure the ovirt-guest-agent and qemu-guest-agent are installed and running in the VM. Further details are available at: http://www.ovirt.org/documentation/internal/guest-agent/understanding-guest-... In addition, can you please attach full engine/vdsm logs.
Everything else is working well with cinder for running VMs (making disks, running VMs, live migration, etc...). I was able to get live snapshots when using a CephFS Posix storage domain.
Versions.. Ceph 9.2.0 oVirt Latest CentOS 7.2 Cinder 7.0.1-1.el7
Any help would be appreciated.
Thanks, Kevin
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (2)
-
Daniel Erez
-
Kevin Hrpcek