Re: [ovirt-users] Hosted engine on gluster problem

29 Aug 2016


      On Mon, Aug 29, 2016 at 10:47 AM, Simone Tiraboschi <stirabos@redhat.com>
wrote:
...
On Fri, Aug 26, 2016 at 8:54 AM, Sandro Bonazzola <sbonazzo@redhat.com>
wrote:
...
On Tue, Aug 23, 2016 at 8:44 PM, David Gossage <
dgossage@carouselchecks.com> wrote:
...
On Fri, Apr 15, 2016 at 8:00 AM, Luiz Claudio Prazeres Goncalves <
luizcpg@gmail.com> wrote:
...
I'm not planning to move to ovirt 4 until it gets stable, so would be
great to backport to 3.6 or ,ideally, gets developed on the next release of
3.6 branch. Considering the urgency (its a single point of failure) x
complexity wouldn't be hard to make the proposed fix.
Bumping old email sorry.  Looks like https://bugzilla.redhat.com/sh
ow_bug.cgi?id=1298693 was finished against 3.6.7 according to that RFE.
So does that mean if I add appropriate lines to my
 /etc/ovirt-hosted-engine/hosted-engine.conf the next time I restart
engine and agent/brokers to mount that storage point it will utilize the
backupvol-server features?
If so are appropriate settings outlined in docs somewhere?
Running ovirt 3.6.7 and gluster 3.8.2 on centos 7 nodes.
Adding Simone
First step, you have to edit
/etc/ovirt-hosted-engine/hosted-engine.conf on all your hosted-engine
hosts to ensure that the storage field always point to the same entry
point (host01 for instance)
Then on each host you can add something like:
mnt_options=backupvolfile-server=host02.yourdomain.com:host03.
<http://ost03.ovirt.forest.go.th/>yourdomain.com,fetch-
attempts=2,log-level=WARNING,log-file=/var/log/engine_domain.log
Then check the representation of your storage connection in the table
storage_server_connections of the engine DB and make sure that connection
refers to the entry point you used in hosted-engine.conf on all your
hosts, you have lastly to set the value of mount_options also here.
Please tune also the value of network.ping-timeout for your glusterFS volume
to avoid this:
 https://bugzilla.redhat.com/show_bug.cgi?id=1319657#c17
You can find other information here:
https://www.ovirt.org/develop/release-management/features/
engine/self-hosted-engine-gluster-support/
Thanks, I'll review all that information.
...
...
...
I'm using today a production environment on top of gluster replica 3 and
...
this is the only SPF I have.
Thanks
Luiz
Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo@redhat.com>
escreveu:
...
On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer@redhat.com>
wrote:
...
On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves
<luizcpg@gmail.com> wrote:
> Nir, here is the problem:
> https://bugzilla.redhat.com/show_bug.cgi?id=1298693
>
> When you do a hosted-engine --deploy and pick "glusterfs" you don't
have a
> way to define the mount options, therefore, the use of the
> "backupvol-server", however when you create a storage domain from
the UI you
> can, like the attached screen shot.
>
>
> In the hosted-engine --deploy, I would expect a flow which includes
not only
> the "gluster" entrypoint, but also the gluster mount options which
is
> missing today. This option would be optional, but would remove the
single
> point of failure described on the Bug 1298693.
>
> for example:
>
> Existing entry point on the "hosted-engine --deploy" flow
> gluster1.xyz.com:/engine
I agree, this feature must be supported.
It will, and it's currently targeted to 4.0.
...
> Missing option on the "hosted-engine --deploy" flow :
> backupvolfile-server=gluster2.xyz.com,fetch-attempts=3,log-l
evel=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log
>
> Sandro, it seems to me a simple solution which can be easily fixed.
>
> What do you think?
>
> Regards
> -Luiz
>
>
>
> 2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:
>>
>>
>>
>> On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com>
wrote:
>>>
>>> On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves
>>> <luizcpg@gmail.com> wrote:
>>> > Hi Sandro, I've been using gluster with 3 external hosts for a
while
>>> > and
>>> > things are working pretty well, however this single point of
failure
>>> > looks
>>> > like a simple feature to implement,but critical to anyone who
wants to
>>> > use
>>> > gluster on production  . This is not hyperconvergency which has
other
>>> > issues/implications. So , why not have this feature out on 3.6
branch?
>>> > It
>>> > looks like just let vdsm use the 'backupvol-server' option when
>>> > mounting the
>>> > engine domain and make the property tests.
>>>
>>> Can you explain what is the problem, and what is the suggested
solution?
>>>
>>> Engine and vdsm already support the backupvol-server option - you
can
>>> define this option in the storage domain options when you create a
>>> gluster
>>> storage domain. With this option vdsm should be able to connect to
>>> gluster
>>> storage domain even if a brick is down.
>>>
>>> If you don't have this option in engine , you probably cannot add
it with
>>> hosted
>>> engine setup, since for editing it you must put the storage
domain in
>>> maintenance
>>> and if you do this the engine vm will be killed :-) This is is
one of
>>> the issues with
>>> engine managing the storage domain it runs on.
>>>
>>> I think the best way to avoid this issue, is to add a DNS entry
>>> providing the addresses
>>> of all the gluster bricks, and use this address for the gluster
>>> storage domain. This way
>>> the glusterfs mount helper can mount the domain even if one of the
>>> gluster bricks
>>> are down.
>>>
>>> Again, we will need some magic from the hosted engine developers
to
>>> modify the
>>> address of the hosted engine gluster domain on existing system.
>>
>>
>> Magic won't happen without a bz :-) please open one describing
what's
>> requested.
>>
>>
>>>
>>>
>>> Nir
>>>
>>> >
>>> > Could you add this feature to the next release of 3.6 branch?
>>> >
>>> > Thanks
>>> > Luiz
>>> >
>>> > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <
sbonazzo@redhat.com>
>>> > escreveu:
>>> >>
>>> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <
dbond@nrggos.com.au>
>>> >> wrote:
>>> >>>
>>> >>> My setup is hyperconverged. I have placed my test results in
>>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298693
>>> >>>
>>> >>
>>> >> Ok, so you're aware about the limitation of the single point of
>>> >> failure.
>>> >> If you drop the host referenced in hosted engine configuration
for the
>>> >> initial setup it won't be able to connect to shared storage
even if
>>> >> the
>>> >> other hosts in the cluster are up since the entry point is
down.
>>> >> Note that hyperconverged deployment is not supported in 3.6.
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>> Short description of setup:
>>> >>>
>>> >>> 3 hosts with 2 disks each set up with gluster replica 3
across the 6
>>> >>> disks volume name hosted-engine.
>>> >>>
>>> >>> Hostname hosted-storage configured in /etc//hosts to point to
the
>>> >>> host1.
>>> >>>
>>> >>> Installed hosted engine on host1 with the hosted engine
storage path
>>> >>> =
>>> >>> hosted-storage:/hosted-engine
>>> >>>
>>> >>> Install first engine on h1 successful. Hosts h2 and h3 added
to the
>>> >>> hosted engine. All works fine.
>>> >>>
>>> >>> Additional storage and non-hosted engine hosts added etc.
>>> >>>
>>> >>> Additional VMs added to hosted-engine storage (oVirt Reports
VM and
>>> >>> Cinder VM). Additional VM's are hosted by other storage -
cinder and
>>> >>> NFS.
>>> >>>
>>> >>> The system is in production.
>>> >>>
>>> >>>
>>> >>> Engine can be migrated around with the web interface.
>>> >>>
>>> >>>
>>> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is
>>> >>> upgraded
>>> >>> first , new Centos kernel requires host reboot.
>>> >>>
>>> >>> - Engine placed on h2 -  h3 into maintenance (local) upgrade
and
>>> >>> Reboot
>>> >>> h3 - No issues - Local maintenance removed from h3.
>>> >>>
>>> >>> - Engine placed on h3 -  h2 into maintenance (local) upgrade
and
>>> >>> Reboot
>>> >>> h2 - No issues - Local maintenance removed from h2.
>>> >>>
>>> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and
reboot
>>> >>> h1 -
>>> >>> engine crashes and does not start elsewhere, VM(cinder)  on
h3 on
>>> >>> same
>>> >>> gluster volume pauses.
>>> >>>
>>> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with
all
>>> >>> it's
>>> >>> normal BIOS probing)
>>> >>>
>>> >>> - Engine starts after h1 comes back and stabilises
>>> >>>
>>> >>> - VM(cinder) unpauses itself,  VM(reports) continued fine the
whole
>>> >>> time.
>>> >>> I can do no diagnosis on the 2 VMs as the engine is not
available.
>>> >>>
>>> >>> - Local maintenance removed from h1
>>> >>>
>>> >>>
>>> >>> I don't believe the issue is with gluster itself as the volume
>>> >>> remains
>>> >>> accessible on all hosts during this time albeit with a
missing server
>>> >>> (gluster volume status) as each gluster server is rebooted.
>>> >>>
>>> >>> Gluster was upgraded as part of the process, no issues were
seen
>>> >>> here.
>>> >>>
>>> >>>
>>> >>> I have been able to duplicate the issue without the upgrade by
>>> >>> following
>>> >>> the same sort of timeline.
>>> >>>
>>> >>>
>>> >>> ________________________________
>>> >>> From: Sandro Bonazzola <sbonazzo@redhat.com>
>>> >>> Sent: Monday, 11 April 2016 7:11 PM
>>> >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin
Sivak;
>>> >>> Sahina
>>> >>> Bose
>>> >>> Cc: Bond, Darryl; users
>>> >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck
>>> >>> <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote:
>>> >>> Hi Darryl,
>>> >>>
>>> >>> I'm still experimenting with my oVirt installation so I tried
to
>>> >>> recreate the problems you've described.
>>> >>>
>>> >>> My setup has three HA hosts for virtualization and three
machines
>>> >>> for the gluster replica 3 setup.
>>> >>>
>>> >>> I manually migrated the Engine from the initial install host
(one)
>>> >>> to host three. Then shut down host one manually and
interrupted the
>>> >>> fencing mechanisms so the host stayed down. This didn't
bother the
>>> >>> Engine VM at all.
>>> >>>
>>> >>> Did you move the host one to maintenance before shutting down?
>>> >>> Or is this a crash recovery test?
>>> >>>
>>> >>>
>>> >>>
>>> >>> To make things a bit more challenging I then shut down host
three
>>> >>> while running the Engine VM. Of course the Engine was down
for some
>>> >>> time until host two detected the problem. It started the
Engine VM
>>> >>> and everything seems to be running quite well without the
initial
>>> >>> install host.
>>> >>>
>>> >>> Thanks for the feedback!
>>> >>>
>>> >>>
>>> >>>
>>> >>> My only problem is that the HA agent on host two and three
refuse to
>>> >>> start after a reboot due to the fact that the configuration
of the
>>> >>> hosted engine is missing. I wrote another mail to
>>> >>> users@ovirt.org<mailto:users@ovirt.org>
>>> >>> about that.
>>> >>>
>>> >>> This is weird. Martin,  Simone can you please investigate on
this?
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> Cheers
>>> >>> Richard
>>> >>>
>>> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote:
>>> >>> > There seems to be a pretty severe bug with using hosted
engine on
>>> >>> > gluster.
>>> >>> >
>>> >>> > If the host that was used as the initial hosted-engine
--deploy
>>> >>> > host
>>> >>> > goes away, the engine VM wil crash and cannot be restarted
until
>>> >>> > the host
>>> >>> > comes back.
>>> >>>
>>> >>> is this an Hyperconverged setup?
>>> >>>
>>> >>>
>>> >>> >
>>> >>> > This is regardless of which host the engine was currently
running.
>>> >>> >
>>> >>> >
>>> >>> > The issue seems to be buried in the bowels of VDSM and is
not an
>>> >>> > issue
>>> >>> > with gluster itself.
>>> >>>
>>> >>> Sahina, can you please investigate on this?
>>> >>>
>>> >>>
>>> >>> >
>>> >>> > The gluster filesystem is still accessable from the host
that was
>>> >>> > running the engine. The issue has been submitted to
bugzilla but
>>> >>> > the fix is
>>> >>> > some way off (4.1).
>>> >>> >
>>> >>> >
>>> >>> > Can my hosted engine be converted to use NFS (using the
gluster NFS
>>> >>> > server on the same filesystem) without rebuilding my hosted
engine
>>> >>> > (ie
>>> >>> > change domainType=glusterfs to domainType=nfs)?
>>> >>>
>>> >>> >
>>> >>> > What effect would that have on the hosted-engine storage
domain
>>> >>> > inside
>>> >>> > oVirt, ie would the same filesystem be mounted twice or
would it
>>> >>> > just break.
>>> >>> >
>>> >>> >
>>> >>> > Will this actually fix the problem, does it have the same
issue
>>> >>> > when
>>> >>> > the hosted engine is on NFS?
>>> >>> >
>>> >>> >
>>> >>> > Darryl
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > ________________________________
>>> >>> >
>>> >>> > The contents of this electronic message and any attachments
are
>>> >>> > intended only for the addressee and may contain legally
privileged,
>>> >>> > personal, sensitive or confidential information. If you are
not the
>>> >>> > intended
>>> >>> > addressee, and have received this email, any transmission,
>>> >>> > distribution,
>>> >>> > downloading, printing or photocopying of the contents of
this
>>> >>> > message or
>>> >>> > attachments is strictly prohibited. Any legal privilege or
>>> >>> > confidentiality
>>> >>> > attached to this message and attachments is not waived,
lost or
>>> >>> > destroyed by
>>> >>> > reason of delivery to any person other than intended
addressee. If
>>> >>> > you have
>>> >>> > received this message and are not the intended addressee
you should
>>> >>> > notify
>>> >>> > the sender by return email and destroy all copies of the
message
>>> >>> > and any
>>> >>> > attachments. Unless expressly attributed, the views
expressed in
>>> >>> > this email
>>> >>> > do not necessarily represent the views of the company.
>>> >>> > _______________________________________________
>>> >>> > Users mailing list
>>> >>> > Users@ovirt.org<mailto:Users@ovirt.org>
>>> >>> > http://lists.ovirt.org/mailman/listinfo/users
>>> >>> >
>>> >>>
>>> >>>
>>> >>> --
>>> >>> /dev/null
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> Users mailing list
>>> >>> Users@ovirt.org<mailto:Users@ovirt.org>
>>> >>> http://lists.ovirt.org/mailman/listinfo/users
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Sandro Bonazzola
>>> >>> Better technology. Faster innovation. Powered by community
>>> >>> collaboration.
>>> >>> See how it works at redhat.com<http://redhat.com>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Sandro Bonazzola
>>> >> Better technology. Faster innovation. Powered by community
>>> >> collaboration.
>>> >> See how it works at redhat.com
>>> >> _______________________________________________
>>> >> Users mailing list
>>> >> Users@ovirt.org
>>> >> http://lists.ovirt.org/mailman/listinfo/users
>>> >
>>> >
>>> > _______________________________________________
>>> > Users mailing list
>>> > Users@ovirt.org
>>> > http://lists.ovirt.org/mailman/listinfo/users
>>> >
>>
>>
>>
>>
>> --
>> Sandro Bonazzola
>> Better technology. Faster innovation. Powered by community
collaboration.
>> See how it works at redhat.com
>
>
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community
collaboration.
See how it works at redhat.com
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com