Hello,
I'm running into a problem with live snapshots not working when using
cinder/ceph disks. There are different failures for including and not
including memory, but in each case cinder/ceph creates a new snapshot that
can be seen in cinder and ceph. When doing a memory/disk snapshot the VM
ends up in a paused state and I need to kill -9 the qemu process to be able
to boot the vm again. The engine seems to be losing connection with the
vdsm process on the VM host after freezing the guest's filesystems. The
guest never receives the thaw command and it fails in the logs. I am
pasting in some log snippets.
2016-04-12 19:24:58,851 INFO
[org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand]
(org.ovirt.thread.pool-8-thread-27) [5c4493e] Ending command
'org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand' successfully.
2016-04-12 19:27:56,873 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-27) [4d97ca06] Correlation ID: null, Call
Stack: null, Custom Event ID: -1, Message: VDSM OVCL1A command failed:
Message timeout which can be caused by communication issues
2016-04-12 19:27:56,873 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand]
(DefaultQuartzScheduler_Worker-27) [4d97ca06] Command
'org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand' return value
'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=5022,
message=Message timeout which can be caused by communication issues]]'
2016-04-12 19:27:56,874 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand]
(DefaultQuartzScheduler_Worker-27) [4d97ca06] HostName = OVCL1A
2016-04-12 19:27:56,874 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand]
(DefaultQuartzScheduler_Worker-27) [4d97ca06] Command
'SnapshotVDSCommand(HostName = OVCL1A,
SnapshotVDSCommandParameters:{runAsync='true',
hostId='9bdfaedc-34a8-4a08-ad8a-c117835a6094',
vmId='040609f6-cfe0-4763-8b32-08ffad158c93'})' execution failed:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues
2016-04-12 19:27:56,875 WARN [org.ovirt.engine.core.vdsbroker.VdsManager]
(org.ovirt.thread.pool-8-thread-16) [4d97ca06] Host 'OVCL1A' is not
responding.
Disk only live snapshots freeze the guest file systems, the vm receives the
thaw command, but the VM is no longer responsive. The VM pings on the
network but it is hung and it also needs a kill -9 to the qemu process so
that it can be booted again.
jsonrpc.Executor/0::DEBUG::2016-04-12
19:41:58,342::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling
'VM.snapshot' in bridge with {u'frozen': True, u'vmID':
u'040609f6-cfe0-4763-8b32-08ffad158c93', u'snapDrives': []}
jsonrpc.Executor/0::INFO::2016-04-12
19:41:58,343::vm::3237::virt.vm::(snapshot)
vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::<domainsnapshot>
<disks/>
</domainsnapshot>
jsonrpc.Executor/0::ERROR::2016-04-12
19:41:58,346::vm::3252::virt.vm::(snapshot)
vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Unable to take snapshot
Traceback (most recent call last):
File "/usr/share/vdsm/virt/vm.py", line 3250, in snapshot
self._dom.snapshotCreateXML(snapxml, snapFlags)
File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
124, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in
wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2581, in
snapshotCreateXML
if ret is None:raise libvirtError('virDomainSnapshotCreateXML()
failed', dom=self)
libvirtError: unsupported configuration: nothing selected for snapshot
jsonrpc.Executor/7::DEBUG::2016-04-12
19:41:58,391::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling
'VM.thaw' in bridge with {u'vmID':
u'040609f6-cfe0-4763-8b32-08ffad158c93'}
jsonrpc.Executor/7::INFO::2016-04-12
19:41:58,391::vm::3041::virt.vm::(thaw)
vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::Thawing guest filesystems
jsonrpc.Executor/7::INFO::2016-04-12
19:41:58,396::vm::3056::virt.vm::(thaw)
vmId=`040609f6-cfe0-4763-8b32-08ffad158c93`::6 guest filesystems thawed
Everything else is working well with cinder for running VMs (making disks,
running VMs, live migration, etc...). I was able to get live snapshots when
using a CephFS Posix storage domain.
Versions..
Ceph 9.2.0
oVirt Latest
CentOS 7.2
Cinder 7.0.1-1.el7
Any help would be appreciated.
Thanks,
Kevin