GlusterFS storage domain disappears while deploying hosted-engine

Sandro, Please take a look at BZ 1201355. In my last comment, I posted a solution to keep the storage and the hosted-engine VM alive during hosted-engine --deploy. We just need to move the glusterfs process out from the vdsmd.service cgroup, so the storage and VM are not killed during vdsmd restart. Please let me know when I can test a patched nightly build. Regards, Christopher

Il 06/04/2015 10:27, Christopher Pereira ha scritto:
Sandro,
Please take a look at BZ 1201355. In my last comment, I posted a solution to keep the storage and the hosted-engine VM alive during hosted-engine --deploy. We just need to move the glusterfs process out from the vdsmd.service cgroup, so the storage and VM are not killed during vdsmd restart.
Please let me know when I can test a patched nightly build.
Hi Christopher, thanks for the workaround you described. I'm trying to validate it using master nighlty but I can't manage to have libvirt working properly with gluster3.7. I have libvirt crashing on VM shutdown. I'll keep you updated as soon as I'll have new code to be tested there.
Regards, Christopher _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

On 07-04-2015 12:53, Sandro Bonazzola wrote:
Il 06/04/2015 10:27, Christopher Pereira ha scritto:
Sandro,
Please take a look at BZ 1201355. In my last comment, I posted a solution to keep the storage and the hosted-engine VM alive during hosted-engine --deploy. We just need to move the glusterfs process out from the vdsmd.service cgroup, so the storage and VM are not killed during vdsmd restart.
Please let me know when I can test a patched nightly build. Hi Christopher, thanks for the workaround you described. I found out it is more safe to mount /rhev manually (instead of doing --connect-storage) to have it running outside from the vdsmd.service. Try monitoring if the glusterfs process survives a vdsmd restart before going with the --deploy. I'm trying to validate it using master nighlty but I can't manage to have libvirt working properly with gluster3.7. I have libvirt crashing on VM shutdown. I'll keep you updated as soon as I'll have new code to be tested there. What's the error message?

Il 07/04/2015 18:17, Christopher Pereira ha scritto:
On 07-04-2015 12:53, Sandro Bonazzola wrote:
Il 06/04/2015 10:27, Christopher Pereira ha scritto:
Sandro,
Please take a look at BZ 1201355. In my last comment, I posted a solution to keep the storage and the hosted-engine VM alive during hosted-engine --deploy. We just need to move the glusterfs process out from the vdsmd.service cgroup, so the storage and VM are not killed during vdsmd restart.
Please let me know when I can test a patched nightly build. Hi Christopher, thanks for the workaround you described. I found out it is more safe to mount /rhev manually (instead of doing --connect-storage) to have it running outside from the vdsmd.service. Try monitoring if the glusterfs process survives a vdsmd restart before going with the --deploy. I'm trying to validate it using master nighlty but I can't manage to have libvirt working properly with gluster3.7. I have libvirt crashing on VM shutdown. I'll keep you updated as soon as I'll have new code to be tested there. What's the error message?
you can see vdsm recovering from crash right after VM shutdown. libvirt.log shows: [2015-04-07 12:34:55.402213] E [glfs.c:1011:pub_glfs_fini] 0-glfs: call_pool_cnt - 0,pin_refcnt - 0 [2015-04-07 12:34:55.402826] E [rpc-transport.c:512:rpc_transport_unref] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7ff7de685516] (--> /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7ff7de454493] (--> /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7ff7de4577dc] (--> /lib64/libglusterfs.so.0(+0x1edc1)[0x7ff7de681dc1] (--> /lib64/libglusterfs.so.0(+0x1ed55)[0x7ff7de681d55] ))))) 0-rpc_transport: invalid argument: this 2015-04-07 12:34:56.827+0000: shutting down VDSM shows: libvirtEventLoop::DEBUG::2015-04-07 14:34:55,202::vm::4541::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`5c5bdfc6-ee5c-4141-899a-bbc5f0790ac6`::event Shutdown detail 0 opaque None VM Channels Listener::DEBUG::2015-04-07 14:34:56,826::vmchannels::54::vds::(_handle_event) Received 00000019 on fileno 56 and when it try to start the VM again: Thread-158::ERROR::2015-04-07 14:35:07,241::vm::1145::vm.Vm::(_startUnderlyingVm) vmId=`5c5bdfc6-ee5c-4141-899a-bbc5f0790ac6`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 1093, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 2191, in _run self._connection.createXML(domxml, flags), File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 144, in wrapper __connections.get(id(target)).pingLibvirt() File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3579, in getLibVersion if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self) libvirtError: internal error: client socket is closed -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
participants (2)
-
Christopher Pereira
-
Sandro Bonazzola