On Tue, Apr 12, 2016 at 11:13 PM, Kevin Hrpcek <khrpcek@gmail.com> wrote:
Hello,

I'm running into a problem with live snapshots not working when using cinder/ceph disks. There are different failures for including and not including memory, but in each case cinder/ceph creates a new snapshot that can be seen in cinder and ceph. When doing a memory/disk snapshot the VM ends up in a paused state and I need to kill -9 the qemu process to be able to boot the vm again. The engine seems to be losing connection with the vdsm process on the VM host after freezing the guest's filesystems. The guest never receives the thaw command and it fails in the logs. I am pasting in some log snippets.

2016-04-12 19:24:58,851 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-8-thread-27) [5c4493e] Ending command 'org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand' successfully.
2016-04-12 19:27:56,873 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM OVCL1A command failed: Message timeout which can be caused by communication issues
2016-04-12 19:27:56,873 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=5022, message=Message timeout which can be caused by communication issues]]'
2016-04-12 19:27:56,874 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] HostName = OVCL1A
2016-04-12 19:27:56,874 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (DefaultQuartzScheduler_Worker-27) [4d97ca06] Command 'SnapshotVDSCommand(HostName = OVCL1A, SnapshotVDSCommandParameters:{runAsync='true', hostId='9bdfaedc-34a8-4a08-ad8a-c117835a6094', vmId='040609f6-cfe0-4763-8b32-08ffad158c93'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues
2016-04-12 19:27:56,875 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-16) [4d97ca06] Host 'OVCL1A' is not responding.

Disk only live snapshots freeze the guest file systems, the vm receives the thaw command, but the VM is no longer responsive. The VM pings on the network but it is hung and it also needs a kill -9 to the qemu process so that it can be booted again.

jsonrpc.Executor/0::DEBUG::2016-04-12 19:41:58,342::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.snapshot' in bridge with {u'frozen': True, u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93', u'snapDrives': []}
jsonrpc.Executor/0::INFO::2016-04-12 19:41:58,343::vm::3237::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::<domainsnapshot>
        <disks/>
</domainsnapshot>

jsonrpc.Executor/0::ERROR::2016-04-12 19:41:58,346::vm::3252::virt.vm::(snapshot) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 3250, in snapshot
    self._dom.snapshotCreateXML(snapxml, snapFlags)
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2581, in snapshotCreateXML
    if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=self)
libvirtError: unsupported configuration: nothing selected for snapshot
jsonrpc.Executor/7::DEBUG::2016-04-12 19:41:58,391::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.thaw' in bridge with {u'vmID': u'040609f6-cfe0-4763-8b32-08ffad158c93'}
jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,391::vm::3041::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Thawing guest filesystems
jsonrpc.Executor/7::INFO::2016-04-12 19:41:58,396::vm::3056::virt.vm::(thaw) vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::6 guest filesystems thawed

It could be an issue of a guest agent. Please make sure the ovirt-guest-agent and qemu-guest-agent are installed and running in the VM. Further details are available at: http://www.ovirt.org/documentation/internal/guest-agent/understanding-guest-agents-and-other-tools/
In addition, can you please attach full engine/vdsm logs.
 

Everything else is working well with cinder for running VMs (making disks, running VMs, live migration, etc...). I was able to get live snapshots when using a CephFS Posix storage domain.

Versions..
Ceph 9.2.0
oVirt Latest
CentOS 7.2
Cinder 7.0.1-1.el7

Any help would be appreciated.

Thanks,
Kevin

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users