[ovirt-users] Hosted engine on gluster problem

Sahina Bose sabose at redhat.com
Wed Apr 13 14:21:08 UTC 2016



On 04/12/2016 01:33 PM, Sandro Bonazzola wrote:
>
>
> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond at nrggos.com.au 
> <mailto:dbond at nrggos.com.au>> wrote:
>
>     My setup is hyperconverged. I have placed my test results in
>     https://bugzilla.redhat.com/show_bug.cgi?id=1298693
>
>
> Ok, so you're aware about the limitation of the single point of 
> failure. If you drop the host referenced in hosted engine 
> configuration for the initial setup it won't be able to connect to 
> shared storage even if the other hosts in the cluster are up since the 
> entry point is down.
> Note that hyperconverged deployment is not supported in 3.6.


This issue does not seem related to the single point of failure. Tested 
this on a 3 node setup with each node mounting the volume hosting HE as 
localhost:/engine. Since all nodes have glusterd running and belong to 
same cluster, with any one node down - mount should continue to work.
But HE VM is restarted once a node is powered off.

broker.log :
Thread-4602::ERROR::2016-04-13 
18:50:28,249::listener::192::ovirt_hosted_engine_ha.broker.list
ener.ConnectionHandler::(handle) Error handling request, data: 
'set-storage-domain FilesystemB
ackend dom_type=glusterfs sd_uuid=7fe3707b-2435-4e71-b831-4daba08cc72c'
Traceback (most recent call last):
   File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 166,
  in handle
     data)
   File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", 
line 299,
  in _dispatch
     .set_storage_domain(client, sd_type, **options)
   File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", 
lin
e 66, in set_storage_domain
     self._backends[client].connect()
   File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", 
line
  456, in connect
     self._dom_type)
   File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", 
line
  108, in get_domain_path
     " in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain 
7fe3707b-2435-4e71-b831-4daba08cc72c not found
  in /rhev/data-center/mnt/glusterSD

agent.log
MainThread::INFO::2016-04-13 
18:50:26,020::storage_server::207::ovirt_hosted_engine_ha.lib.sto
rage_server.StorageServer::(connect_storage_server) Connecting storage 
server
MainThread::INFO::2016-04-13 
18:50:28,054::hosted_engine::807::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) 
Stopped VDSM domain monitor for 7fe3707b-2435-4e71-b831-4daba08cc72c
MainThread::INFO::2016-04-13 
18:50:28,055::image::184::ovirt_hosted_engine_ha.lib.image.Image::(teardown_images) 
Teardown images
MainThread::WARNING::2016-04-13 
18:50:28,177::hosted_engine::675::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) 
Disconnecting the storage
MainThread::INFO::2016-04-13 
18:50:28,177::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server) 
Disconnecting storage server



The gluster mount logs for this time frame contain unmount messages
[2016-04-13 13:20:28.199429] I [fuse-bridge.c:4997:fuse_thread_proc] 
0-fuse: unmounting /rhev/
data-center/mnt/glusterSD/localhost:_engine
[2016-04-13 13:20:28.199934] W [glusterfsd.c:1251:cleanup_and_exit] 
(-->/lib64/libpthread.so.0
(+0x7dc5) [0x7ff9b3ceddc5] 
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff9b53588b5] -
->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7ff9b5358739] ) 0-: 
received signum (15), shut
ting down
[2016-04-13 13:20:28.199970] I [fuse-bridge.c:5704:fini] 0-fuse: 
Unmounting '/rhev/data-center
/mnt/glusterSD/localhost:_engine'.


>
>
>     Short description of setup:
>
>     3 hosts with 2 disks each set up with gluster replica 3 across the
>     6 disks volume name hosted-engine.
>
>     Hostname hosted-storage configured in /etc//hosts to point to the
>     host1.
>
>     Installed hosted engine on host1 with the hosted engine storage
>     path = hosted-storage:/hosted-engine
>
>     Install first engine on h1 successful. Hosts h2 and h3 added to
>     the hosted engine. All works fine.
>
>     Additional storage and non-hosted engine hosts added etc.
>
>     Additional VMs added to hosted-engine storage (oVirt Reports VM
>     and Cinder VM). Additional VM's are hosted by other storage -
>     cinder and NFS.
>
>     The system is in production.
>
>
>     Engine can be migrated around with the web interface.
>
>
>     - 3.6.4 upgrade released, follow the upgrade guide, engine is
>     upgraded first , new Centos kernel requires host reboot.
>
>     - Engine placed on h2 -  h3 into maintenance (local) upgrade and
>     Reboot h3 - No issues - Local maintenance removed from h3.
>
>     - Engine placed on h3 -  h2 into maintenance (local) upgrade and
>     Reboot h2 - No issues - Local maintenance removed from h2.
>
>     - Engine placed on h3 -h1 into mainteance (local) upgrade and
>     reboot h1 - engine crashes and does not start elsewhere,
>     VM(cinder)  on h3 on same gluster volume pauses.
>
>     - Host 1 takes about 5 minutes to reboot (Enterprise box with all
>     it's normal BIOS probing)
>
>     - Engine starts after h1 comes back and stabilises
>
>     - VM(cinder) unpauses itself,  VM(reports) continued fine the
>     whole time. I can do no diagnosis on the 2 VMs as the engine is
>     not available.
>
>     - Local maintenance removed from h​1
>
>
>     I don't believe the issue is with gluster itself as the volume
>     remains accessible on all hosts during this time albeit with a
>     missing server (gluster volume status) as each gluster server is
>     rebooted.
>
>     Gluster was upgraded as part of the process, no issues were seen here.
>
>
>     I have been able to duplicate the issue without the upgrade by
>     following the same sort of timeline.
>
>
>     ________________________________
>     From: Sandro Bonazzola <sbonazzo at redhat.com
>     <mailto:sbonazzo at redhat.com>>
>     Sent: Monday, 11 April 2016 7:11 PM
>     To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak;
>     Sahina Bose
>     Cc: Bond, Darryl; users
>     Subject: Re: [ovirt-users] Hosted engine on gluster problem
>
>
>
>     On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck
>     <hawk at tbi.univie.ac.at
>     <mailto:hawk at tbi.univie.ac.at><mailto:hawk at tbi.univie.ac.at
>     <mailto:hawk at tbi.univie.ac.at>>> wrote:
>     Hi Darryl,
>
>     I'm still experimenting with my oVirt installation so I tried to
>     recreate the problems you've described.
>
>     My setup has three HA hosts for virtualization and three machines
>     for the gluster replica 3 setup.
>
>     I manually migrated the Engine from the initial install host (one)
>     to host three. Then shut down host one manually and interrupted the
>     fencing mechanisms so the host stayed down. This didn't bother the
>     Engine VM at all.
>
>     Did you move the host one to maintenance before shutting down?
>     Or is this a crash recovery test?
>
>
>
>     To make things a bit more challenging I then shut down host three
>     while running the Engine VM. Of course the Engine was down for some
>     time until host two detected the problem. It started the Engine VM
>     and everything seems to be running quite well without the initial
>     install host.
>
>     Thanks for the feedback!
>
>
>
>     My only problem is that the HA agent on host two and three refuse to
>     start after a reboot due to the fact that the configuration of the
>     hosted engine is missing. I wrote another mail to users at ovirt.org
>     <mailto:users at ovirt.org><mailto:users at ovirt.org
>     <mailto:users at ovirt.org>>
>     about that.
>
>     This is weird. Martin,  Simone can you please investigate on this?
>
>
>
>
>     Cheers
>     Richard
>
>     On 04/08/2016 01:38 AM, Bond, Darryl wrote:
>     > There seems to be a pretty severe bug with using hosted engine
>     on gluster.
>     >
>     > If the host that was used as the initial hosted-engine --deploy
>     host goes away, the engine VM wil crash and cannot be restarted
>     until the host comes back.
>
>     is this an Hyperconverged setup?
>
>
>     >
>     > This is regardless of which host the engine was currently running.
>     >
>     >
>     > The issue seems to be buried in the bowels of VDSM and is not an
>     issue with gluster itself.
>
>     Sahina, can you please investigate on this?
>
>
>     >
>     > The gluster filesystem is still accessable from the host that
>     was running the engine. The issue has been submitted to bugzilla
>     but the fix is some way off (4.1).
>     >
>     >
>     > Can my hosted engine be converted to use NFS (using the gluster
>     NFS server on the same filesystem) without rebuilding my hosted
>     engine (ie change domainType=glusterfs to domainType=nfs)?
>
>     >
>     > What effect would that have on the hosted-engine storage domain
>     inside oVirt, ie would the same filesystem be mounted twice or
>     would it just break.
>     >
>     >
>     > Will this actually fix the problem, does it have the same issue
>     when the hosted engine is on NFS?
>     >
>     >
>     > Darryl
>     >
>     >
>     >
>     >
>     > ________________________________
>     >
>     > The contents of this electronic message and any attachments are
>     intended only for the addressee and may contain legally
>     privileged, personal, sensitive or confidential information. If
>     you are not the intended addressee, and have received this email,
>     any transmission, distribution, downloading, printing or
>     photocopying of the contents of this message or attachments is
>     strictly prohibited. Any legal privilege or confidentiality
>     attached to this message and attachments is not waived, lost or
>     destroyed by reason of delivery to any person other than intended
>     addressee. If you have received this message and are not the
>     intended addressee you should notify the sender by return email
>     and destroy all copies of the message and any attachments. Unless
>     expressly attributed, the views expressed in this email do not
>     necessarily represent the views of the company.
>     > _______________________________________________
>     > Users mailing list
>     > Users at ovirt.org <mailto:Users at ovirt.org><mailto:Users at ovirt.org
>     <mailto:Users at ovirt.org>>
>     >http://lists.ovirt.org/mailman/listinfo/users
>     >
>
>
>     --
>     /dev/null
>
>
>     _______________________________________________
>     Users mailing list
>     Users at ovirt.org <mailto:Users at ovirt.org><mailto:Users at ovirt.org
>     <mailto:Users at ovirt.org>>
>     http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>     --
>     Sandro Bonazzola
>     Better technology. Faster innovation. Powered by community
>     collaboration.
>     See how it works at redhat.com <http://redhat.com><http://redhat.com>
>
>
>
>
> -- 
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community collaboration.
> See how it works at redhat.com <http://redhat.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160413/b5a4addc/attachment-0001.html>


More information about the Users mailing list