[ovirt-users] Hosted engine on gluster problem

Sandro Bonazzola sbonazzo at redhat.com
Wed Apr 13 07:15:46 UTC 2016


On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer at redhat.com> wrote:

> On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves
> <luizcpg at gmail.com> wrote:
> > Hi Sandro, I've been using gluster with 3 external hosts for a while and
> > things are working pretty well, however this single point of failure
> looks
> > like a simple feature to implement,but critical to anyone who wants to
> use
> > gluster on production  . This is not hyperconvergency which has other
> > issues/implications. So , why not have this feature out on 3.6 branch? It
> > looks like just let vdsm use the 'backupvol-server' option when mounting
> the
> > engine domain and make the property tests.
>
> Can you explain what is the problem, and what is the suggested solution?
>
> Engine and vdsm already support the backupvol-server option - you can
> define this option in the storage domain options when you create a gluster
> storage domain. With this option vdsm should be able to connect to gluster
> storage domain even if a brick is down.
>
> If you don't have this option in engine , you probably cannot add it with
> hosted
> engine setup, since for editing it you must put the storage domain in
> maintenance
> and if you do this the engine vm will be killed :-) This is is one of
> the issues with
> engine managing the storage domain it runs on.
>
> I think the best way to avoid this issue, is to add a DNS entry
> providing the addresses
> of all the gluster bricks, and use this address for the gluster
> storage domain. This way
> the glusterfs mount helper can mount the domain even if one of the
> gluster bricks
> are down.
>
> Again, we will need some magic from the hosted engine developers to modify
> the
> address of the hosted engine gluster domain on existing system.
>

Magic won't happen without a bz :-) please open one describing what's
requested.



>
> Nir
>
> >
> > Could you add this feature to the next release of 3.6 branch?
> >
> > Thanks
> > Luiz
> >
> > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <sbonazzo at redhat.com>
> > escreveu:
> >>
> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond at nrggos.com.au>
> >> wrote:
> >>>
> >>> My setup is hyperconverged. I have placed my test results in
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298693
> >>>
> >>
> >> Ok, so you're aware about the limitation of the single point of failure.
> >> If you drop the host referenced in hosted engine configuration for the
> >> initial setup it won't be able to connect to shared storage even if the
> >> other hosts in the cluster are up since the entry point is down.
> >> Note that hyperconverged deployment is not supported in 3.6.
> >>
> >>
> >>>
> >>>
> >>> Short description of setup:
> >>>
> >>> 3 hosts with 2 disks each set up with gluster replica 3 across the 6
> >>> disks volume name hosted-engine.
> >>>
> >>> Hostname hosted-storage configured in /etc//hosts to point to the
> host1.
> >>>
> >>> Installed hosted engine on host1 with the hosted engine storage path =
> >>> hosted-storage:/hosted-engine
> >>>
> >>> Install first engine on h1 successful. Hosts h2 and h3 added to the
> >>> hosted engine. All works fine.
> >>>
> >>> Additional storage and non-hosted engine hosts added etc.
> >>>
> >>> Additional VMs added to hosted-engine storage (oVirt Reports VM and
> >>> Cinder VM). Additional VM's are hosted by other storage - cinder and
> NFS.
> >>>
> >>> The system is in production.
> >>>
> >>>
> >>> Engine can be migrated around with the web interface.
> >>>
> >>>
> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded
> >>> first , new Centos kernel requires host reboot.
> >>>
> >>> - Engine placed on h2 -  h3 into maintenance (local) upgrade and Reboot
> >>> h3 - No issues - Local maintenance removed from h3.
> >>>
> >>> - Engine placed on h3 -  h2 into maintenance (local) upgrade and Reboot
> >>> h2 - No issues - Local maintenance removed from h2.
> >>>
> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot
> h1 -
> >>> engine crashes and does not start elsewhere, VM(cinder)  on h3 on same
> >>> gluster volume pauses.
> >>>
> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with all it's
> >>> normal BIOS probing)
> >>>
> >>> - Engine starts after h1 comes back and stabilises
> >>>
> >>> - VM(cinder) unpauses itself,  VM(reports) continued fine the whole
> time.
> >>> I can do no diagnosis on the 2 VMs as the engine is not available.
> >>>
> >>> - Local maintenance removed from h1
> >>>
> >>>
> >>> I don't believe the issue is with gluster itself as the volume remains
> >>> accessible on all hosts during this time albeit with a missing server
> >>> (gluster volume status) as each gluster server is rebooted.
> >>>
> >>> Gluster was upgraded as part of the process, no issues were seen here.
> >>>
> >>>
> >>> I have been able to duplicate the issue without the upgrade by
> following
> >>> the same sort of timeline.
> >>>
> >>>
> >>> ________________________________
> >>> From: Sandro Bonazzola <sbonazzo at redhat.com>
> >>> Sent: Monday, 11 April 2016 7:11 PM
> >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak;
> Sahina
> >>> Bose
> >>> Cc: Bond, Darryl; users
> >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem
> >>>
> >>>
> >>>
> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck
> >>> <hawk at tbi.univie.ac.at<mailto:hawk at tbi.univie.ac.at>> wrote:
> >>> Hi Darryl,
> >>>
> >>> I'm still experimenting with my oVirt installation so I tried to
> >>> recreate the problems you've described.
> >>>
> >>> My setup has three HA hosts for virtualization and three machines
> >>> for the gluster replica 3 setup.
> >>>
> >>> I manually migrated the Engine from the initial install host (one)
> >>> to host three. Then shut down host one manually and interrupted the
> >>> fencing mechanisms so the host stayed down. This didn't bother the
> >>> Engine VM at all.
> >>>
> >>> Did you move the host one to maintenance before shutting down?
> >>> Or is this a crash recovery test?
> >>>
> >>>
> >>>
> >>> To make things a bit more challenging I then shut down host three
> >>> while running the Engine VM. Of course the Engine was down for some
> >>> time until host two detected the problem. It started the Engine VM
> >>> and everything seems to be running quite well without the initial
> >>> install host.
> >>>
> >>> Thanks for the feedback!
> >>>
> >>>
> >>>
> >>> My only problem is that the HA agent on host two and three refuse to
> >>> start after a reboot due to the fact that the configuration of the
> >>> hosted engine is missing. I wrote another mail to
> >>> users at ovirt.org<mailto:users at ovirt.org>
> >>> about that.
> >>>
> >>> This is weird. Martin,  Simone can you please investigate on this?
> >>>
> >>>
> >>>
> >>>
> >>> Cheers
> >>> Richard
> >>>
> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote:
> >>> > There seems to be a pretty severe bug with using hosted engine on
> >>> > gluster.
> >>> >
> >>> > If the host that was used as the initial hosted-engine --deploy host
> >>> > goes away, the engine VM wil crash and cannot be restarted until the
> host
> >>> > comes back.
> >>>
> >>> is this an Hyperconverged setup?
> >>>
> >>>
> >>> >
> >>> > This is regardless of which host the engine was currently running.
> >>> >
> >>> >
> >>> > The issue seems to be buried in the bowels of VDSM and is not an
> issue
> >>> > with gluster itself.
> >>>
> >>> Sahina, can you please investigate on this?
> >>>
> >>>
> >>> >
> >>> > The gluster filesystem is still accessable from the host that was
> >>> > running the engine. The issue has been submitted to bugzilla but the
> fix is
> >>> > some way off (4.1).
> >>> >
> >>> >
> >>> > Can my hosted engine be converted to use NFS (using the gluster NFS
> >>> > server on the same filesystem) without rebuilding my hosted engine
> (ie
> >>> > change domainType=glusterfs to domainType=nfs)?
> >>>
> >>> >
> >>> > What effect would that have on the hosted-engine storage domain
> inside
> >>> > oVirt, ie would the same filesystem be mounted twice or would it
> just break.
> >>> >
> >>> >
> >>> > Will this actually fix the problem, does it have the same issue when
> >>> > the hosted engine is on NFS?
> >>> >
> >>> >
> >>> > Darryl
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > ________________________________
> >>> >
> >>> > The contents of this electronic message and any attachments are
> >>> > intended only for the addressee and may contain legally privileged,
> >>> > personal, sensitive or confidential information. If you are not the
> intended
> >>> > addressee, and have received this email, any transmission,
> distribution,
> >>> > downloading, printing or photocopying of the contents of this
> message or
> >>> > attachments is strictly prohibited. Any legal privilege or
> confidentiality
> >>> > attached to this message and attachments is not waived, lost or
> destroyed by
> >>> > reason of delivery to any person other than intended addressee. If
> you have
> >>> > received this message and are not the intended addressee you should
> notify
> >>> > the sender by return email and destroy all copies of the message and
> any
> >>> > attachments. Unless expressly attributed, the views expressed in
> this email
> >>> > do not necessarily represent the views of the company.
> >>> > _______________________________________________
> >>> > Users mailing list
> >>> > Users at ovirt.org<mailto:Users at ovirt.org>
> >>> > http://lists.ovirt.org/mailman/listinfo/users
> >>> >
> >>>
> >>>
> >>> --
> >>> /dev/null
> >>>
> >>>
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users at ovirt.org<mailto:Users at ovirt.org>
> >>> http://lists.ovirt.org/mailman/listinfo/users
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sandro Bonazzola
> >>> Better technology. Faster innovation. Powered by community
> collaboration.
> >>> See how it works at redhat.com<http://redhat.com>
> >>
> >>
> >>
> >>
> >> --
> >> Sandro Bonazzola
> >> Better technology. Faster innovation. Powered by community
> collaboration.
> >> See how it works at redhat.com
> >> _______________________________________________
> >> Users mailing list
> >> Users at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>



-- 
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160413/0abe9570/attachment-0001.html>


More information about the Users mailing list