[ovirt-devel] VDSM crashed while adding host to newly created cluster

Christopher Pereira kripper at imatronix.cl
Thu Mar 26 21:16:24 UTC 2015


Continuing with the 3.6 Night Builds testing...

While hosted-engine-setup was adding the host to the newly created 
cluster, VDSM crashed, probably because the gluster engine storage 
disappeared as in BZ 1201355 [1]

Facts:
     - the engine storage (/rhev/data-center/mmt/...) was umounted 
during this process
     - another mount of the same volume was still mounted after the VDSM 
crash (maybe the problem is not related with gluster)

After doing a "hosted-engine --connect-storage", the volume is mounted 
again.
Now, when trying to restart VDSM, I get an "invalid lockspace":

     Thread-46::ERROR::2015-03-26 
19:24:31,843::vm::1237::vm.Vm::(_startUnderlyingVm) 
vmId=`191045ac-79e4-4ce8-aad7-52cc9af313c5`::The vm start process failed
     Traceback (most recent call last):
       File "/usr/share/vdsm/virt/vm.py", line 1185, in _startUnderlyingVm
         self._run()
       File "/usr/share/vdsm/virt/vm.py", line 2253, in _run
         self._connection.createXML(domxml, flags),
       File 
"/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 126, 
in wrapper
         ret = f(*args, **kwargs)
       File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3427, 
in createXML
         if ret is None:raise libvirtError('virDomainCreateXML() 
failed', conn=self)
     libvirtError: Failed to acquire lock: No space left on device
     Thread-46::INFO::2015-03-26 
19:24:31,844::vm::1709::vm.Vm::(setDownStatus) 
vmId=`191045ac-79e4-4ce8-aad7-52cc9af313c5`::Changed state to Down: 
Failed to acquire lock: No space left on device (code=1)
     Thread-46::DEBUG::2015-03-26 
19:24:31,844::vmchannels::214::vds::(unregister) Delete fileno 60 from 
listener.
     VM Channels Listener::DEBUG::2015-03-26 
19:24:32,346::vmchannels::121::vds::(_do_del_channels) fileno 60 was 
removed from listener.

In sanlock.log we have:

     2015-03-26 19:24:30+0000 7589 [752]: cmd 9 target pid 9559 not found
     2015-03-26 19:24:31+0000 7589 [764]: r7 cmd_acquire 2,8,9559 
invalid lockspace found -1 failed 935819904 name 
7ba46e75-51af-4648-becc-5a469cb8e9c2

(All 3 lease files are present)

This problem is similar to BZ 1201355 reported by Sandro [1].

About the hosted-engine VM not being resumed after restarting VDSM, 
please check [2] and [3] (duplicated).
I confirmed that QEMU is not reopening the file descriptors when 
resuming a paused VMs, which explains those issues.

Now, how can I fix the "invalid lockspace"?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1201355
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1172905
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1058300



More information about the Devel mailing list