[ovirt-users] Hosted engine on gluster problem

Luiz Claudio Prazeres Goncalves luizcpg at gmail.com
Fri Apr 15 13:00:37 UTC 2016


I'm not planning to move to ovirt 4 until it gets stable, so would be great
to backport to 3.6 or ,ideally, gets developed on the next release of 3.6
branch. Considering the urgency (its a single point of failure) x
complexity wouldn't be hard to make the proposed fix.

I'm using today a production environment on top of gluster replica 3 and
this is the only SPF I have.

Thanks
Luiz

Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo at redhat.com>
escreveu:

> On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>
>> On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves
>> <luizcpg at gmail.com> wrote:
>> > Nir, here is the problem:
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1298693
>> >
>> > When you do a hosted-engine --deploy and pick "glusterfs" you don't
>> have a
>> > way to define the mount options, therefore, the use of the
>> > "backupvol-server", however when you create a storage domain from the
>> UI you
>> > can, like the attached screen shot.
>> >
>> >
>> > In the hosted-engine --deploy, I would expect a flow which includes not
>> only
>> > the "gluster" entrypoint, but also the gluster mount options which is
>> > missing today. This option would be optional, but would remove the
>> single
>> > point of failure described on the Bug 1298693.
>> >
>> > for example:
>> >
>> > Existing entry point on the "hosted-engine --deploy" flow
>> > gluster1.xyz.com:/engine
>>
>> I agree, this feature must be supported.
>>
>
> It will, and it's currently targeted to 4.0.
>
>
>
>>
>> > Missing option on the "hosted-engine --deploy" flow :
>> > backupvolfile-server=gluster2.xyz.com
>> ,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log
>> >
>> > Sandro, it seems to me a simple solution which can be easily fixed.
>> >
>> > What do you think?
>> >
>> > Regards
>> > -Luiz
>> >
>> >
>> >
>> > 2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo at redhat.com>:
>> >>
>> >>
>> >>
>> >> On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer at redhat.com>
>> wrote:
>> >>>
>> >>> On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves
>> >>> <luizcpg at gmail.com> wrote:
>> >>> > Hi Sandro, I've been using gluster with 3 external hosts for a while
>> >>> > and
>> >>> > things are working pretty well, however this single point of failure
>> >>> > looks
>> >>> > like a simple feature to implement,but critical to anyone who wants
>> to
>> >>> > use
>> >>> > gluster on production  . This is not hyperconvergency which has
>> other
>> >>> > issues/implications. So , why not have this feature out on 3.6
>> branch?
>> >>> > It
>> >>> > looks like just let vdsm use the 'backupvol-server' option when
>> >>> > mounting the
>> >>> > engine domain and make the property tests.
>> >>>
>> >>> Can you explain what is the problem, and what is the suggested
>> solution?
>> >>>
>> >>> Engine and vdsm already support the backupvol-server option - you can
>> >>> define this option in the storage domain options when you create a
>> >>> gluster
>> >>> storage domain. With this option vdsm should be able to connect to
>> >>> gluster
>> >>> storage domain even if a brick is down.
>> >>>
>> >>> If you don't have this option in engine , you probably cannot add it
>> with
>> >>> hosted
>> >>> engine setup, since for editing it you must put the storage domain in
>> >>> maintenance
>> >>> and if you do this the engine vm will be killed :-) This is is one of
>> >>> the issues with
>> >>> engine managing the storage domain it runs on.
>> >>>
>> >>> I think the best way to avoid this issue, is to add a DNS entry
>> >>> providing the addresses
>> >>> of all the gluster bricks, and use this address for the gluster
>> >>> storage domain. This way
>> >>> the glusterfs mount helper can mount the domain even if one of the
>> >>> gluster bricks
>> >>> are down.
>> >>>
>> >>> Again, we will need some magic from the hosted engine developers to
>> >>> modify the
>> >>> address of the hosted engine gluster domain on existing system.
>> >>
>> >>
>> >> Magic won't happen without a bz :-) please open one describing what's
>> >> requested.
>> >>
>> >>
>> >>>
>> >>>
>> >>> Nir
>> >>>
>> >>> >
>> >>> > Could you add this feature to the next release of 3.6 branch?
>> >>> >
>> >>> > Thanks
>> >>> > Luiz
>> >>> >
>> >>> > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <
>> sbonazzo at redhat.com>
>> >>> > escreveu:
>> >>> >>
>> >>> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <
>> dbond at nrggos.com.au>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> My setup is hyperconverged. I have placed my test results in
>> >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298693
>> >>> >>>
>> >>> >>
>> >>> >> Ok, so you're aware about the limitation of the single point of
>> >>> >> failure.
>> >>> >> If you drop the host referenced in hosted engine configuration for
>> the
>> >>> >> initial setup it won't be able to connect to shared storage even if
>> >>> >> the
>> >>> >> other hosts in the cluster are up since the entry point is down.
>> >>> >> Note that hyperconverged deployment is not supported in 3.6.
>> >>> >>
>> >>> >>
>> >>> >>>
>> >>> >>>
>> >>> >>> Short description of setup:
>> >>> >>>
>> >>> >>> 3 hosts with 2 disks each set up with gluster replica 3 across
>> the 6
>> >>> >>> disks volume name hosted-engine.
>> >>> >>>
>> >>> >>> Hostname hosted-storage configured in /etc//hosts to point to the
>> >>> >>> host1.
>> >>> >>>
>> >>> >>> Installed hosted engine on host1 with the hosted engine storage
>> path
>> >>> >>> =
>> >>> >>> hosted-storage:/hosted-engine
>> >>> >>>
>> >>> >>> Install first engine on h1 successful. Hosts h2 and h3 added to
>> the
>> >>> >>> hosted engine. All works fine.
>> >>> >>>
>> >>> >>> Additional storage and non-hosted engine hosts added etc.
>> >>> >>>
>> >>> >>> Additional VMs added to hosted-engine storage (oVirt Reports VM
>> and
>> >>> >>> Cinder VM). Additional VM's are hosted by other storage - cinder
>> and
>> >>> >>> NFS.
>> >>> >>>
>> >>> >>> The system is in production.
>> >>> >>>
>> >>> >>>
>> >>> >>> Engine can be migrated around with the web interface.
>> >>> >>>
>> >>> >>>
>> >>> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is
>> >>> >>> upgraded
>> >>> >>> first , new Centos kernel requires host reboot.
>> >>> >>>
>> >>> >>> - Engine placed on h2 -  h3 into maintenance (local) upgrade and
>> >>> >>> Reboot
>> >>> >>> h3 - No issues - Local maintenance removed from h3.
>> >>> >>>
>> >>> >>> - Engine placed on h3 -  h2 into maintenance (local) upgrade and
>> >>> >>> Reboot
>> >>> >>> h2 - No issues - Local maintenance removed from h2.
>> >>> >>>
>> >>> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and
>> reboot
>> >>> >>> h1 -
>> >>> >>> engine crashes and does not start elsewhere, VM(cinder)  on h3 on
>> >>> >>> same
>> >>> >>> gluster volume pauses.
>> >>> >>>
>> >>> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with all
>> >>> >>> it's
>> >>> >>> normal BIOS probing)
>> >>> >>>
>> >>> >>> - Engine starts after h1 comes back and stabilises
>> >>> >>>
>> >>> >>> - VM(cinder) unpauses itself,  VM(reports) continued fine the
>> whole
>> >>> >>> time.
>> >>> >>> I can do no diagnosis on the 2 VMs as the engine is not available.
>> >>> >>>
>> >>> >>> - Local maintenance removed from h1
>> >>> >>>
>> >>> >>>
>> >>> >>> I don't believe the issue is with gluster itself as the volume
>> >>> >>> remains
>> >>> >>> accessible on all hosts during this time albeit with a missing
>> server
>> >>> >>> (gluster volume status) as each gluster server is rebooted.
>> >>> >>>
>> >>> >>> Gluster was upgraded as part of the process, no issues were seen
>> >>> >>> here.
>> >>> >>>
>> >>> >>>
>> >>> >>> I have been able to duplicate the issue without the upgrade by
>> >>> >>> following
>> >>> >>> the same sort of timeline.
>> >>> >>>
>> >>> >>>
>> >>> >>> ________________________________
>> >>> >>> From: Sandro Bonazzola <sbonazzo at redhat.com>
>> >>> >>> Sent: Monday, 11 April 2016 7:11 PM
>> >>> >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak;
>> >>> >>> Sahina
>> >>> >>> Bose
>> >>> >>> Cc: Bond, Darryl; users
>> >>> >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck
>> >>> >>> <hawk at tbi.univie.ac.at<mailto:hawk at tbi.univie.ac.at>> wrote:
>> >>> >>> Hi Darryl,
>> >>> >>>
>> >>> >>> I'm still experimenting with my oVirt installation so I tried to
>> >>> >>> recreate the problems you've described.
>> >>> >>>
>> >>> >>> My setup has three HA hosts for virtualization and three machines
>> >>> >>> for the gluster replica 3 setup.
>> >>> >>>
>> >>> >>> I manually migrated the Engine from the initial install host (one)
>> >>> >>> to host three. Then shut down host one manually and interrupted
>> the
>> >>> >>> fencing mechanisms so the host stayed down. This didn't bother the
>> >>> >>> Engine VM at all.
>> >>> >>>
>> >>> >>> Did you move the host one to maintenance before shutting down?
>> >>> >>> Or is this a crash recovery test?
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> To make things a bit more challenging I then shut down host three
>> >>> >>> while running the Engine VM. Of course the Engine was down for
>> some
>> >>> >>> time until host two detected the problem. It started the Engine VM
>> >>> >>> and everything seems to be running quite well without the initial
>> >>> >>> install host.
>> >>> >>>
>> >>> >>> Thanks for the feedback!
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> My only problem is that the HA agent on host two and three refuse
>> to
>> >>> >>> start after a reboot due to the fact that the configuration of the
>> >>> >>> hosted engine is missing. I wrote another mail to
>> >>> >>> users at ovirt.org<mailto:users at ovirt.org>
>> >>> >>> about that.
>> >>> >>>
>> >>> >>> This is weird. Martin,  Simone can you please investigate on this?
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> Cheers
>> >>> >>> Richard
>> >>> >>>
>> >>> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote:
>> >>> >>> > There seems to be a pretty severe bug with using hosted engine
>> on
>> >>> >>> > gluster.
>> >>> >>> >
>> >>> >>> > If the host that was used as the initial hosted-engine --deploy
>> >>> >>> > host
>> >>> >>> > goes away, the engine VM wil crash and cannot be restarted until
>> >>> >>> > the host
>> >>> >>> > comes back.
>> >>> >>>
>> >>> >>> is this an Hyperconverged setup?
>> >>> >>>
>> >>> >>>
>> >>> >>> >
>> >>> >>> > This is regardless of which host the engine was currently
>> running.
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > The issue seems to be buried in the bowels of VDSM and is not an
>> >>> >>> > issue
>> >>> >>> > with gluster itself.
>> >>> >>>
>> >>> >>> Sahina, can you please investigate on this?
>> >>> >>>
>> >>> >>>
>> >>> >>> >
>> >>> >>> > The gluster filesystem is still accessable from the host that
>> was
>> >>> >>> > running the engine. The issue has been submitted to bugzilla but
>> >>> >>> > the fix is
>> >>> >>> > some way off (4.1).
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > Can my hosted engine be converted to use NFS (using the gluster
>> NFS
>> >>> >>> > server on the same filesystem) without rebuilding my hosted
>> engine
>> >>> >>> > (ie
>> >>> >>> > change domainType=glusterfs to domainType=nfs)?
>> >>> >>>
>> >>> >>> >
>> >>> >>> > What effect would that have on the hosted-engine storage domain
>> >>> >>> > inside
>> >>> >>> > oVirt, ie would the same filesystem be mounted twice or would it
>> >>> >>> > just break.
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > Will this actually fix the problem, does it have the same issue
>> >>> >>> > when
>> >>> >>> > the hosted engine is on NFS?
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > Darryl
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > ________________________________
>> >>> >>> >
>> >>> >>> > The contents of this electronic message and any attachments are
>> >>> >>> > intended only for the addressee and may contain legally
>> privileged,
>> >>> >>> > personal, sensitive or confidential information. If you are not
>> the
>> >>> >>> > intended
>> >>> >>> > addressee, and have received this email, any transmission,
>> >>> >>> > distribution,
>> >>> >>> > downloading, printing or photocopying of the contents of this
>> >>> >>> > message or
>> >>> >>> > attachments is strictly prohibited. Any legal privilege or
>> >>> >>> > confidentiality
>> >>> >>> > attached to this message and attachments is not waived, lost or
>> >>> >>> > destroyed by
>> >>> >>> > reason of delivery to any person other than intended addressee.
>> If
>> >>> >>> > you have
>> >>> >>> > received this message and are not the intended addressee you
>> should
>> >>> >>> > notify
>> >>> >>> > the sender by return email and destroy all copies of the message
>> >>> >>> > and any
>> >>> >>> > attachments. Unless expressly attributed, the views expressed in
>> >>> >>> > this email
>> >>> >>> > do not necessarily represent the views of the company.
>> >>> >>> > _______________________________________________
>> >>> >>> > Users mailing list
>> >>> >>> > Users at ovirt.org<mailto:Users at ovirt.org>
>> >>> >>> > http://lists.ovirt.org/mailman/listinfo/users
>> >>> >>> >
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> /dev/null
>> >>> >>>
>> >>> >>>
>> >>> >>> _______________________________________________
>> >>> >>> Users mailing list
>> >>> >>> Users at ovirt.org<mailto:Users at ovirt.org>
>> >>> >>> http://lists.ovirt.org/mailman/listinfo/users
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> Sandro Bonazzola
>> >>> >>> Better technology. Faster innovation. Powered by community
>> >>> >>> collaboration.
>> >>> >>> See how it works at redhat.com<http://redhat.com>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Sandro Bonazzola
>> >>> >> Better technology. Faster innovation. Powered by community
>> >>> >> collaboration.
>> >>> >> See how it works at redhat.com
>> >>> >> _______________________________________________
>> >>> >> Users mailing list
>> >>> >> Users at ovirt.org
>> >>> >> http://lists.ovirt.org/mailman/listinfo/users
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > Users mailing list
>> >>> > Users at ovirt.org
>> >>> > http://lists.ovirt.org/mailman/listinfo/users
>> >>> >
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sandro Bonazzola
>> >> Better technology. Faster innovation. Powered by community
>> collaboration.
>> >> See how it works at redhat.com
>> >
>> >
>>
>
>
>
> --
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community collaboration.
> See how it works at redhat.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160415/cebbc488/attachment-0001.html>


More information about the Users mailing list