[ovirt-devel] GlusterFS storage domain disappears while deploying hosted-engine

Wed Apr 8 09:05:26 UTC 2015

Il 07/04/2015 18:17, Christopher Pereira ha scritto:
> 
> On 07-04-2015 12:53, Sandro Bonazzola wrote:
>> Il 06/04/2015 10:27, Christopher Pereira ha scritto:
>>> Sandro,
>>>
>>> Please take a look at BZ 1201355.
>>> In my last comment, I posted a solution to keep the storage and the hosted-engine VM alive during hosted-engine --deploy.
>>> We just need to move the glusterfs process out from the vdsmd.service cgroup, so the storage and VM are not killed during vdsmd restart.
>>>
>>> Please let me know when I can test a patched nightly build.
>> Hi Christopher, thanks for the workaround you described.
> I found out it is more safe to mount /rhev manually (instead of doing --connect-storage) to have it running outside from the vdsmd.service.
> Try monitoring if the glusterfs process survives a vdsmd restart before going with the --deploy.
>> I'm trying to validate it using master nighlty but I can't manage to have libvirt working properly with gluster3.7.
>> I have libvirt crashing on VM shutdown.
>> I'll keep you updated as soon as I'll have new code to be tested there.
> What's the error message?
> 

you can see vdsm recovering from crash right after VM shutdown.
libvirt.log shows:

[2015-04-07 12:34:55.402213] E [glfs.c:1011:pub_glfs_fini] 0-glfs: call_pool_cnt - 0,pin_refcnt - 0
[2015-04-07 12:34:55.402826] E [rpc-transport.c:512:rpc_transport_unref] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7ff7de685516] (-->
/lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7ff7de454493] (--> /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7ff7de4577dc] (-->
/lib64/libglusterfs.so.0(+0x1edc1)[0x7ff7de681dc1] (--> /lib64/libglusterfs.so.0(+0x1ed55)[0x7ff7de681d55] ))))) 0-rpc_transport: invalid argument: this
2015-04-07 12:34:56.827+0000: shutting down

VDSM shows:

libvirtEventLoop::DEBUG::2015-04-07 14:34:55,202::vm::4541::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`5c5bdfc6-ee5c-4141-899a-bbc5f0790ac6`::event
Shutdown detail 0 opaque None
VM Channels Listener::DEBUG::2015-04-07 14:34:56,826::vmchannels::54::vds::(_handle_event) Received 00000019 on fileno 56

and when it try to start the VM again:

Thread-158::ERROR::2015-04-07 14:35:07,241::vm::1145::vm.Vm::(_startUnderlyingVm) vmId=`5c5bdfc6-ee5c-4141-899a-bbc5f0790ac6`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 1093, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 2191, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 144, in wrapper
    __connections.get(id(target)).pingLibvirt()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3579, in getLibVersion
    if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
libvirtError: internal error: client socket is closed

-- 
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com