Re: [ovirt-users] Hosted engine on gluster problem

29 Aug 2016

      On Fri, Aug 26, 2016 at 8:54 AM, Sandro Bonazzola <sbonazzo@redhat.com>
wrote:
...
On Tue, Aug 23, 2016 at 8:44 PM, David Gossage <
dgossage@carouselchecks.com> wrote:
...
On Fri, Apr 15, 2016 at 8:00 AM, Luiz Claudio Prazeres Goncalves <
luizcpg@gmail.com> wrote:
...
I'm not planning to move to ovirt 4 until it gets stable, so would be
great to backport to 3.6 or ,ideally, gets developed on the next release of
3.6 branch. Considering the urgency (its a single point of failure) x
complexity wouldn't be hard to make the proposed fix.
Bumping old email sorry.  Looks like https://bugzilla.redhat.com/sh
ow_bug.cgi?id=1298693 was finished against 3.6.7 according to that RFE.
So does that mean if I add appropriate lines to my
 /etc/ovirt-hosted-engine/hosted-engine.conf the next time I restart
engine and agent/brokers to mount that storage point it will utilize the
backupvol-server features?
If so are appropriate settings outlined in docs somewhere?
Running ovirt 3.6.7 and gluster 3.8.2 on centos 7 nodes.
Adding Simone
First step, you have to edit
/etc/ovirt-hosted-engine/hosted-engine.conf on all your hosted-engine
hosts to ensure that the storage field always point to the same entry
point (host01 for instance)
Then on each host you can add something like:
mnt_options=backupvolfile-server=host02.yourdomain.com:host03.
<http://ost03.ovirt.forest.go.th/>yourdomain.com,fetch-attempts=2,log-level=
WARNING,log-file=/var/log/engine_domain.log

Then check the representation of your storage connection in the table
storage_server_connections
of the engine DB and make sure that connection refers to the entry point
you used in hosted-engine.conf on all your hosts, you have lastly to set
the value of mount_options also here.

Please tune also the value of network.ping-timeout for your glusterFS volume
to avoid this:
 https://bugzilla.redhat.com/show_bug.cgi?id=1319657#c17

You can find other information here:
https://www.ovirt.org/develop/release-management/features/engine/self-hosted...
...
...
I'm using today a production environment on top of gluster replica 3 and
...
this is the only SPF I have.
Thanks
Luiz
Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo@redhat.com>
escreveu:
...
On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer@redhat.com> wrote:
...
...
Nir, here is the problem:
https://bugzilla.redhat.com/show_bug.cgi?id=1298693
When you do a hosted-engine --deploy and pick "glusterfs" you don't
have a
way to define the mount options, therefore, the use of the
"backupvol-server", however when you create a storage domain from
On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves
<luizcpg@gmail.com> wrote:
the UI you
...
can, like the attached screen shot.
In the hosted-engine --deploy, I would expect a flow which includes
not only
the "gluster" entrypoint, but also the gluster mount options which is
missing today. This option would be optional, but would remove the
single
point of failure described on the Bug 1298693.
for example:
Existing entry point on the "hosted-engine --deploy" flow
gluster1.xyz.com:/engine
I agree, this feature must be supported.
It will, and it's currently targeted to 4.0.
...
...
Missing option on the "hosted-engine --deploy" flow :
backupvolfile-server=gluster2.xyz.com,fetch-attempts=3,log-l
evel=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log
Sandro, it seems to me a simple solution which can be easily fixed.
What do you think?
Regards
-Luiz
2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:
>
>
>
> On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com>
wrote:
>>
>> On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves
>> <luizcpg@gmail.com> wrote:
>> > Hi Sandro, I've been using gluster with 3 external hosts for a
while
>> > and
>> > things are working pretty well, however this single point of
failure
>> > looks
>> > like a simple feature to implement,but critical to anyone who
wants to
>> > use
>> > gluster on production  . This is not hyperconvergency which has
other
>> > issues/implications. So , why not have this feature out on 3.6
branch?
>> > It
>> > looks like just let vdsm use the 'backupvol-server' option when
>> > mounting the
>> > engine domain and make the property tests.
>>
>> Can you explain what is the problem, and what is the suggested
solution?
>>
>> Engine and vdsm already support the backupvol-server option - you
can
>> define this option in the storage domain options when you create a
>> gluster
>> storage domain. With this option vdsm should be able to connect to
>> gluster
>> storage domain even if a brick is down.
>>
>> If you don't have this option in engine , you probably cannot add
it with
>> hosted
>> engine setup, since for editing it you must put the storage domain
in
>> maintenance
>> and if you do this the engine vm will be killed :-) This is is one
of
>> the issues with
>> engine managing the storage domain it runs on.
>>
>> I think the best way to avoid this issue, is to add a DNS entry
>> providing the addresses
>> of all the gluster bricks, and use this address for the gluster
>> storage domain. This way
>> the glusterfs mount helper can mount the domain even if one of the
>> gluster bricks
>> are down.
>>
>> Again, we will need some magic from the hosted engine developers to
>> modify the
>> address of the hosted engine gluster domain on existing system.
>
>
> Magic won't happen without a bz :-) please open one describing
what's
> requested.
>
>
>>
>>
>> Nir
>>
>> >
>> > Could you add this feature to the next release of 3.6 branch?
>> >
>> > Thanks
>> > Luiz
>> >
>> > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <
sbonazzo@redhat.com>
>> > escreveu:
>> >>
>> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <
dbond@nrggos.com.au>
>> >> wrote:
>> >>>
>> >>> My setup is hyperconverged. I have placed my test results in
>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298693
>> >>>
>> >>
>> >> Ok, so you're aware about the limitation of the single point of
>> >> failure.
>> >> If you drop the host referenced in hosted engine configuration
for the
>> >> initial setup it won't be able to connect to shared storage
even if
>> >> the
>> >> other hosts in the cluster are up since the entry point is down.
>> >> Note that hyperconverged deployment is not supported in 3.6.
>> >>
>> >>
>> >>>
>> >>>
>> >>> Short description of setup:
>> >>>
>> >>> 3 hosts with 2 disks each set up with gluster replica 3 across
the 6
>> >>> disks volume name hosted-engine.
>> >>>
>> >>> Hostname hosted-storage configured in /etc//hosts to point to
the
>> >>> host1.
>> >>>
>> >>> Installed hosted engine on host1 with the hosted engine
storage path
>> >>> =
>> >>> hosted-storage:/hosted-engine
>> >>>
>> >>> Install first engine on h1 successful. Hosts h2 and h3 added
to the
>> >>> hosted engine. All works fine.
>> >>>
>> >>> Additional storage and non-hosted engine hosts added etc.
>> >>>
>> >>> Additional VMs added to hosted-engine storage (oVirt Reports
VM and
>> >>> Cinder VM). Additional VM's are hosted by other storage -
cinder and
>> >>> NFS.
>> >>>
>> >>> The system is in production.
>> >>>
>> >>>
>> >>> Engine can be migrated around with the web interface.
>> >>>
>> >>>
>> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is
>> >>> upgraded
>> >>> first , new Centos kernel requires host reboot.
>> >>>
>> >>> - Engine placed on h2 -  h3 into maintenance (local) upgrade
and
>> >>> Reboot
>> >>> h3 - No issues - Local maintenance removed from h3.
>> >>>
>> >>> - Engine placed on h3 -  h2 into maintenance (local) upgrade
and
>> >>> Reboot
>> >>> h2 - No issues - Local maintenance removed from h2.
>> >>>
>> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and
reboot
>> >>> h1 -
>> >>> engine crashes and does not start elsewhere, VM(cinder)  on h3
on
>> >>> same
>> >>> gluster volume pauses.
>> >>>
>> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with
all
>> >>> it's
>> >>> normal BIOS probing)
>> >>>
>> >>> - Engine starts after h1 comes back and stabilises
>> >>>
>> >>> - VM(cinder) unpauses itself,  VM(reports) continued fine the
whole
>> >>> time.
>> >>> I can do no diagnosis on the 2 VMs as the engine is not
available.
>> >>>
>> >>> - Local maintenance removed from h1
>> >>>
>> >>>
>> >>> I don't believe the issue is with gluster itself as the volume
>> >>> remains
>> >>> accessible on all hosts during this time albeit with a missing
server
>> >>> (gluster volume status) as each gluster server is rebooted.
>> >>>
>> >>> Gluster was upgraded as part of the process, no issues were
seen
>> >>> here.
>> >>>
>> >>>
>> >>> I have been able to duplicate the issue without the upgrade by
>> >>> following
>> >>> the same sort of timeline.
>> >>>
>> >>>
>> >>> ________________________________
>> >>> From: Sandro Bonazzola <sbonazzo@redhat.com>
>> >>> Sent: Monday, 11 April 2016 7:11 PM
>> >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin
Sivak;
>> >>> Sahina
>> >>> Bose
>> >>> Cc: Bond, Darryl; users
>> >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck
>> >>> <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote:
>> >>> Hi Darryl,
>> >>>
>> >>> I'm still experimenting with my oVirt installation so I tried
to
>> >>> recreate the problems you've described.
>> >>>
>> >>> My setup has three HA hosts for virtualization and three
machines
>> >>> for the gluster replica 3 setup.
>> >>>
>> >>> I manually migrated the Engine from the initial install host
(one)
>> >>> to host three. Then shut down host one manually and
interrupted the
>> >>> fencing mechanisms so the host stayed down. This didn't bother
the
>> >>> Engine VM at all.
>> >>>
>> >>> Did you move the host one to maintenance before shutting down?
>> >>> Or is this a crash recovery test?
>> >>>
>> >>>
>> >>>
>> >>> To make things a bit more challenging I then shut down host
three
>> >>> while running the Engine VM. Of course the Engine was down for
some
>> >>> time until host two detected the problem. It started the
Engine VM
>> >>> and everything seems to be running quite well without the
initial
>> >>> install host.
>> >>>
>> >>> Thanks for the feedback!
>> >>>
>> >>>
>> >>>
>> >>> My only problem is that the HA agent on host two and three
refuse to
>> >>> start after a reboot due to the fact that the configuration of
the
>> >>> hosted engine is missing. I wrote another mail to
>> >>> users@ovirt.org<mailto:users@ovirt.org>
>> >>> about that.
>> >>>
>> >>> This is weird. Martin,  Simone can you please investigate on
this?
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Cheers
>> >>> Richard
>> >>>
>> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote:
>> >>> > There seems to be a pretty severe bug with using hosted
engine on
>> >>> > gluster.
>> >>> >
>> >>> > If the host that was used as the initial hosted-engine
--deploy
>> >>> > host
>> >>> > goes away, the engine VM wil crash and cannot be restarted
until
>> >>> > the host
>> >>> > comes back.
>> >>>
>> >>> is this an Hyperconverged setup?
>> >>>
>> >>>
>> >>> >
>> >>> > This is regardless of which host the engine was currently
running.
>> >>> >
>> >>> >
>> >>> > The issue seems to be buried in the bowels of VDSM and is
not an
>> >>> > issue
>> >>> > with gluster itself.
>> >>>
>> >>> Sahina, can you please investigate on this?
>> >>>
>> >>>
>> >>> >
>> >>> > The gluster filesystem is still accessable from the host
that was
>> >>> > running the engine. The issue has been submitted to bugzilla
but
>> >>> > the fix is
>> >>> > some way off (4.1).
>> >>> >
>> >>> >
>> >>> > Can my hosted engine be converted to use NFS (using the
gluster NFS
>> >>> > server on the same filesystem) without rebuilding my hosted
engine
>> >>> > (ie
>> >>> > change domainType=glusterfs to domainType=nfs)?
>> >>>
>> >>> >
>> >>> > What effect would that have on the hosted-engine storage
domain
>> >>> > inside
>> >>> > oVirt, ie would the same filesystem be mounted twice or
would it
>> >>> > just break.
>> >>> >
>> >>> >
>> >>> > Will this actually fix the problem, does it have the same
issue
>> >>> > when
>> >>> > the hosted engine is on NFS?
>> >>> >
>> >>> >
>> >>> > Darryl
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > ________________________________
>> >>> >
>> >>> > The contents of this electronic message and any attachments
are
>> >>> > intended only for the addressee and may contain legally
privileged,
>> >>> > personal, sensitive or confidential information. If you are
not the
>> >>> > intended
>> >>> > addressee, and have received this email, any transmission,
>> >>> > distribution,
>> >>> > downloading, printing or photocopying of the contents of this
>> >>> > message or
>> >>> > attachments is strictly prohibited. Any legal privilege or
>> >>> > confidentiality
>> >>> > attached to this message and attachments is not waived, lost
or
>> >>> > destroyed by
>> >>> > reason of delivery to any person other than intended
addressee. If
>> >>> > you have
>> >>> > received this message and are not the intended addressee you
should
>> >>> > notify
>> >>> > the sender by return email and destroy all copies of the
message
>> >>> > and any
>> >>> > attachments. Unless expressly attributed, the views
expressed in
>> >>> > this email
>> >>> > do not necessarily represent the views of the company.
>> >>> > _______________________________________________
>> >>> > Users mailing list
>> >>> > Users@ovirt.org<mailto:Users@ovirt.org>
>> >>> > http://lists.ovirt.org/mailman/listinfo/users
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> /dev/null
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Users mailing list
>> >>> Users@ovirt.org<mailto:Users@ovirt.org>
>> >>> http://lists.ovirt.org/mailman/listinfo/users
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Sandro Bonazzola
>> >>> Better technology. Faster innovation. Powered by community
>> >>> collaboration.
>> >>> See how it works at redhat.com<http://redhat.com>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sandro Bonazzola
>> >> Better technology. Faster innovation. Powered by community
>> >> collaboration.
>> >> See how it works at redhat.com
>> >> _______________________________________________
>> >> Users mailing list
>> >> Users@ovirt.org
>> >> http://lists.ovirt.org/mailman/listinfo/users
>> >
>> >
>> > _______________________________________________
>> > Users mailing list
>> > Users@ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/users
>> >
>
>
>
>
> --
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community
collaboration.
>> See how it works at redhat.com
>
>
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community
collaboration.
See how it works at redhat.com
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com