I'm not planning to move to ovirt 4 until it gets stable, so would be great
to backport to 3.6 or ,ideally, gets developed on the next release of 3.6
branch. Considering the urgency (its a single point of failure) x
complexity wouldn't be hard to make the proposed fix.
I'm using today a production environment on top of gluster replica 3 and
this is the only SPF I have.
Thanks
Luiz
Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo(a)redhat.com>
escreveu:
On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer
<nsoffer(a)redhat.com> wrote:
> On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves
> <luizcpg(a)gmail.com> wrote:
> > Nir, here is the problem:
> >
https://bugzilla.redhat.com/show_bug.cgi?id=1298693
> >
> > When you do a hosted-engine --deploy and pick "glusterfs" you
don't
> have a
> > way to define the mount options, therefore, the use of the
> > "backupvol-server", however when you create a storage domain from the
> UI you
> > can, like the attached screen shot.
> >
> >
> > In the hosted-engine --deploy, I would expect a flow which includes not
> only
> > the "gluster" entrypoint, but also the gluster mount options which is
> > missing today. This option would be optional, but would remove the
> single
> > point of failure described on the Bug 1298693.
> >
> > for example:
> >
> > Existing entry point on the "hosted-engine --deploy" flow
> > gluster1.xyz.com:/engine
>
> I agree, this feature must be supported.
>
It will, and it's currently targeted to 4.0.
>
> > Missing option on the "hosted-engine --deploy" flow :
> >
backupvolfile-server=gluster2.xyz.com
>
,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log
> >
> > Sandro, it seems to me a simple solution which can be easily fixed.
> >
> > What do you think?
> >
> > Regards
> > -Luiz
> >
> >
> >
> > 2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo(a)redhat.com>:
> >>
> >>
> >>
> >> On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer(a)redhat.com>
> wrote:
> >>>
> >>> On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves
> >>> <luizcpg(a)gmail.com> wrote:
> >>> > Hi Sandro, I've been using gluster with 3 external hosts for a
while
> >>> > and
> >>> > things are working pretty well, however this single point of
failure
> >>> > looks
> >>> > like a simple feature to implement,but critical to anyone who
wants
> to
> >>> > use
> >>> > gluster on production . This is not hyperconvergency which has
> other
> >>> > issues/implications. So , why not have this feature out on 3.6
> branch?
> >>> > It
> >>> > looks like just let vdsm use the 'backupvol-server' option
when
> >>> > mounting the
> >>> > engine domain and make the property tests.
> >>>
> >>> Can you explain what is the problem, and what is the suggested
> solution?
> >>>
> >>> Engine and vdsm already support the backupvol-server option - you can
> >>> define this option in the storage domain options when you create a
> >>> gluster
> >>> storage domain. With this option vdsm should be able to connect to
> >>> gluster
> >>> storage domain even if a brick is down.
> >>>
> >>> If you don't have this option in engine , you probably cannot add
it
> with
> >>> hosted
> >>> engine setup, since for editing it you must put the storage domain in
> >>> maintenance
> >>> and if you do this the engine vm will be killed :-) This is is one of
> >>> the issues with
> >>> engine managing the storage domain it runs on.
> >>>
> >>> I think the best way to avoid this issue, is to add a DNS entry
> >>> providing the addresses
> >>> of all the gluster bricks, and use this address for the gluster
> >>> storage domain. This way
> >>> the glusterfs mount helper can mount the domain even if one of the
> >>> gluster bricks
> >>> are down.
> >>>
> >>> Again, we will need some magic from the hosted engine developers to
> >>> modify the
> >>> address of the hosted engine gluster domain on existing system.
> >>
> >>
> >> Magic won't happen without a bz :-) please open one describing
what's
> >> requested.
> >>
> >>
> >>>
> >>>
> >>> Nir
> >>>
> >>> >
> >>> > Could you add this feature to the next release of 3.6 branch?
> >>> >
> >>> > Thanks
> >>> > Luiz
> >>> >
> >>> > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <
> sbonazzo(a)redhat.com>
> >>> > escreveu:
> >>> >>
> >>> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <
> dbond(a)nrggos.com.au>
> >>> >> wrote:
> >>> >>>
> >>> >>> My setup is hyperconverged. I have placed my test results
in
> >>> >>>
https://bugzilla.redhat.com/show_bug.cgi?id=1298693
> >>> >>>
> >>> >>
> >>> >> Ok, so you're aware about the limitation of the single
point of
> >>> >> failure.
> >>> >> If you drop the host referenced in hosted engine configuration
for
> the
> >>> >> initial setup it won't be able to connect to shared storage
even if
> >>> >> the
> >>> >> other hosts in the cluster are up since the entry point is
down.
> >>> >> Note that hyperconverged deployment is not supported in 3.6.
> >>> >>
> >>> >>
> >>> >>>
> >>> >>>
> >>> >>> Short description of setup:
> >>> >>>
> >>> >>> 3 hosts with 2 disks each set up with gluster replica 3
across
> the 6
> >>> >>> disks volume name hosted-engine.
> >>> >>>
> >>> >>> Hostname hosted-storage configured in /etc//hosts to point
to the
> >>> >>> host1.
> >>> >>>
> >>> >>> Installed hosted engine on host1 with the hosted engine
storage
> path
> >>> >>> =
> >>> >>> hosted-storage:/hosted-engine
> >>> >>>
> >>> >>> Install first engine on h1 successful. Hosts h2 and h3
added to
> the
> >>> >>> hosted engine. All works fine.
> >>> >>>
> >>> >>> Additional storage and non-hosted engine hosts added etc.
> >>> >>>
> >>> >>> Additional VMs added to hosted-engine storage (oVirt
Reports VM
> and
> >>> >>> Cinder VM). Additional VM's are hosted by other storage
- cinder
> and
> >>> >>> NFS.
> >>> >>>
> >>> >>> The system is in production.
> >>> >>>
> >>> >>>
> >>> >>> Engine can be migrated around with the web interface.
> >>> >>>
> >>> >>>
> >>> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine
is
> >>> >>> upgraded
> >>> >>> first , new Centos kernel requires host reboot.
> >>> >>>
> >>> >>> - Engine placed on h2 - h3 into maintenance (local)
upgrade and
> >>> >>> Reboot
> >>> >>> h3 - No issues - Local maintenance removed from h3.
> >>> >>>
> >>> >>> - Engine placed on h3 - h2 into maintenance (local)
upgrade and
> >>> >>> Reboot
> >>> >>> h2 - No issues - Local maintenance removed from h2.
> >>> >>>
> >>> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade
and
> reboot
> >>> >>> h1 -
> >>> >>> engine crashes and does not start elsewhere, VM(cinder) on
h3 on
> >>> >>> same
> >>> >>> gluster volume pauses.
> >>> >>>
> >>> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box
with all
> >>> >>> it's
> >>> >>> normal BIOS probing)
> >>> >>>
> >>> >>> - Engine starts after h1 comes back and stabilises
> >>> >>>
> >>> >>> - VM(cinder) unpauses itself, VM(reports) continued fine
the
> whole
> >>> >>> time.
> >>> >>> I can do no diagnosis on the 2 VMs as the engine is not
available.
> >>> >>>
> >>> >>> - Local maintenance removed from h1
> >>> >>>
> >>> >>>
> >>> >>> I don't believe the issue is with gluster itself as the
volume
> >>> >>> remains
> >>> >>> accessible on all hosts during this time albeit with a
missing
> server
> >>> >>> (gluster volume status) as each gluster server is
rebooted.
> >>> >>>
> >>> >>> Gluster was upgraded as part of the process, no issues were
seen
> >>> >>> here.
> >>> >>>
> >>> >>>
> >>> >>> I have been able to duplicate the issue without the upgrade
by
> >>> >>> following
> >>> >>> the same sort of timeline.
> >>> >>>
> >>> >>>
> >>> >>> ________________________________
> >>> >>> From: Sandro Bonazzola <sbonazzo(a)redhat.com>
> >>> >>> Sent: Monday, 11 April 2016 7:11 PM
> >>> >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin
Sivak;
> >>> >>> Sahina
> >>> >>> Bose
> >>> >>> Cc: Bond, Darryl; users
> >>> >>> Subject: Re: [ovirt-users] Hosted engine on gluster
problem
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck
> >>> >>>
<hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote:
> >>> >>> Hi Darryl,
> >>> >>>
> >>> >>> I'm still experimenting with my oVirt installation so I
tried to
> >>> >>> recreate the problems you've described.
> >>> >>>
> >>> >>> My setup has three HA hosts for virtualization and three
machines
> >>> >>> for the gluster replica 3 setup.
> >>> >>>
> >>> >>> I manually migrated the Engine from the initial install
host (one)
> >>> >>> to host three. Then shut down host one manually and
interrupted
> the
> >>> >>> fencing mechanisms so the host stayed down. This didn't
bother the
> >>> >>> Engine VM at all.
> >>> >>>
> >>> >>> Did you move the host one to maintenance before shutting
down?
> >>> >>> Or is this a crash recovery test?
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> To make things a bit more challenging I then shut down host
three
> >>> >>> while running the Engine VM. Of course the Engine was down
for
> some
> >>> >>> time until host two detected the problem. It started the
Engine VM
> >>> >>> and everything seems to be running quite well without the
initial
> >>> >>> install host.
> >>> >>>
> >>> >>> Thanks for the feedback!
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> My only problem is that the HA agent on host two and three
refuse
> to
> >>> >>> start after a reboot due to the fact that the configuration
of the
> >>> >>> hosted engine is missing. I wrote another mail to
> >>> >>> users@ovirt.org<mailto:users@ovirt.org>
> >>> >>> about that.
> >>> >>>
> >>> >>> This is weird. Martin, Simone can you please investigate
on this?
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> Cheers
> >>> >>> Richard
> >>> >>>
> >>> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote:
> >>> >>> > There seems to be a pretty severe bug with using
hosted engine
> on
> >>> >>> > gluster.
> >>> >>> >
> >>> >>> > If the host that was used as the initial hosted-engine
--deploy
> >>> >>> > host
> >>> >>> > goes away, the engine VM wil crash and cannot be
restarted until
> >>> >>> > the host
> >>> >>> > comes back.
> >>> >>>
> >>> >>> is this an Hyperconverged setup?
> >>> >>>
> >>> >>>
> >>> >>> >
> >>> >>> > This is regardless of which host the engine was
currently
> running.
> >>> >>> >
> >>> >>> >
> >>> >>> > The issue seems to be buried in the bowels of VDSM and
is not an
> >>> >>> > issue
> >>> >>> > with gluster itself.
> >>> >>>
> >>> >>> Sahina, can you please investigate on this?
> >>> >>>
> >>> >>>
> >>> >>> >
> >>> >>> > The gluster filesystem is still accessable from the
host that
> was
> >>> >>> > running the engine. The issue has been submitted to
bugzilla but
> >>> >>> > the fix is
> >>> >>> > some way off (4.1).
> >>> >>> >
> >>> >>> >
> >>> >>> > Can my hosted engine be converted to use NFS (using
the gluster
> NFS
> >>> >>> > server on the same filesystem) without rebuilding my
hosted
> engine
> >>> >>> > (ie
> >>> >>> > change domainType=glusterfs to domainType=nfs)?
> >>> >>>
> >>> >>> >
> >>> >>> > What effect would that have on the hosted-engine
storage domain
> >>> >>> > inside
> >>> >>> > oVirt, ie would the same filesystem be mounted twice
or would it
> >>> >>> > just break.
> >>> >>> >
> >>> >>> >
> >>> >>> > Will this actually fix the problem, does it have the
same issue
> >>> >>> > when
> >>> >>> > the hosted engine is on NFS?
> >>> >>> >
> >>> >>> >
> >>> >>> > Darryl
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > ________________________________
> >>> >>> >
> >>> >>> > The contents of this electronic message and any
attachments are
> >>> >>> > intended only for the addressee and may contain
legally
> privileged,
> >>> >>> > personal, sensitive or confidential information. If
you are not
> the
> >>> >>> > intended
> >>> >>> > addressee, and have received this email, any
transmission,
> >>> >>> > distribution,
> >>> >>> > downloading, printing or photocopying of the contents
of this
> >>> >>> > message or
> >>> >>> > attachments is strictly prohibited. Any legal
privilege or
> >>> >>> > confidentiality
> >>> >>> > attached to this message and attachments is not
waived, lost or
> >>> >>> > destroyed by
> >>> >>> > reason of delivery to any person other than intended
addressee.
> If
> >>> >>> > you have
> >>> >>> > received this message and are not the intended
addressee you
> should
> >>> >>> > notify
> >>> >>> > the sender by return email and destroy all copies of
the message
> >>> >>> > and any
> >>> >>> > attachments. Unless expressly attributed, the views
expressed in
> >>> >>> > this email
> >>> >>> > do not necessarily represent the views of the
company.
> >>> >>> > _______________________________________________
> >>> >>> > Users mailing list
> >>> >>> > Users@ovirt.org<mailto:Users@ovirt.org>
> >>> >>> >
http://lists.ovirt.org/mailman/listinfo/users
> >>> >>> >
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> /dev/null
> >>> >>>
> >>> >>>
> >>> >>> _______________________________________________
> >>> >>> Users mailing list
> >>> >>> Users@ovirt.org<mailto:Users@ovirt.org>
> >>> >>>
http://lists.ovirt.org/mailman/listinfo/users
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> Sandro Bonazzola
> >>> >>> Better technology. Faster innovation. Powered by community
> >>> >>> collaboration.
> >>> >>> See how it works at redhat.com<http://redhat.com>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Sandro Bonazzola
> >>> >> Better technology. Faster innovation. Powered by community
> >>> >> collaboration.
> >>> >> See how it works at
redhat.com
> >>> >> _______________________________________________
> >>> >> Users mailing list
> >>> >> Users(a)ovirt.org
> >>> >>
http://lists.ovirt.org/mailman/listinfo/users
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > Users mailing list
> >>> > Users(a)ovirt.org
> >>> >
http://lists.ovirt.org/mailman/listinfo/users
> >>> >
> >>
> >>
> >>
> >>
> >> --
> >> Sandro Bonazzola
> >> Better technology. Faster innovation. Powered by community
> collaboration.
> >> See how it works at
redhat.com
> >
> >
>
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at
redhat.com