I'm not planning to move to ovirt 4 until it gets stable, so would be great to backport to 3.6 or ,ideally, gets developed on the next release of 3.6 branch. Considering the urgency (its a single point of failure) x complexity wouldn't be hard to make the proposed fix.<div><br></div><div>I'm using today a production environment on top of gluster replica 3 and this is the only SPF I have.</div><div><br></div><div>Thanks</div><div>Luiz<br><br><div class="gmail_quote"><div dir="ltr">Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <<a href="mailto:sbonazzo@redhat.com">sbonazzo@redhat.com</a>> escreveu:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <span dir="ltr"><<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves<br>
<<a href="mailto:luizcpg@gmail.com" target="_blank">luizcpg@gmail.com</a>> wrote:<br>
> Nir, here is the problem:<br>
> <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1298693" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1298693</a><br>
><br>
> When you do a hosted-engine --deploy and pick "glusterfs" you don't have a<br>
> way to define the mount options, therefore, the use of the<br>
> "backupvol-server", however when you create a storage domain from the UI you<br>
> can, like the attached screen shot.<br>
><br>
><br>
> In the hosted-engine --deploy, I would expect a flow which includes not only<br>
> the "gluster" entrypoint, but also the gluster mount options which is<br>
> missing today. This option would be optional, but would remove the single<br>
> point of failure described on the Bug 1298693.<br>
><br>
> for example:<br>
><br>
> Existing entry point on the "hosted-engine --deploy" flow<br>
> gluster1.xyz.com:/engine<br>
<br>
</span>I agree, this feature must be supported.<br></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>It will, and it's currently targeted to 4.0.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div><br>
> Missing option on the "hosted-engine --deploy" flow :<br>
> backupvolfile-server=<a href="http://gluster2.xyz.com" rel="noreferrer" target="_blank">gluster2.xyz.com</a>,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log<br>
><br>
> Sandro, it seems to me a simple solution which can be easily fixed.<br>
><br>
> What do you think?<br>
><br>
> Regards<br>
> -Luiz<br>
><br>
><br>
><br>
> 2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <<a href="mailto:sbonazzo@redhat.com" target="_blank">sbonazzo@redhat.com</a>>:<br>
>><br>
>><br>
>><br>
>> On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>> wrote:<br>
>>><br>
>>> On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves<br>
>>> <<a href="mailto:luizcpg@gmail.com" target="_blank">luizcpg@gmail.com</a>> wrote:<br>
>>> > Hi Sandro, I've been using gluster with 3 external hosts for a while<br>
>>> > and<br>
>>> > things are working pretty well, however this single point of failure<br>
>>> > looks<br>
>>> > like a simple feature to implement,but critical to anyone who wants to<br>
>>> > use<br>
>>> > gluster on production . This is not hyperconvergency which has other<br>
>>> > issues/implications. So , why not have this feature out on 3.6 branch?<br>
>>> > It<br>
>>> > looks like just let vdsm use the 'backupvol-server' option when<br>
>>> > mounting the<br>
>>> > engine domain and make the property tests.<br>
>>><br>
>>> Can you explain what is the problem, and what is the suggested solution?<br>
>>><br>
>>> Engine and vdsm already support the backupvol-server option - you can<br>
>>> define this option in the storage domain options when you create a<br>
>>> gluster<br>
>>> storage domain. With this option vdsm should be able to connect to<br>
>>> gluster<br>
>>> storage domain even if a brick is down.<br>
>>><br>
>>> If you don't have this option in engine , you probably cannot add it with<br>
>>> hosted<br>
>>> engine setup, since for editing it you must put the storage domain in<br>
>>> maintenance<br>
>>> and if you do this the engine vm will be killed :-) This is is one of<br>
>>> the issues with<br>
>>> engine managing the storage domain it runs on.<br>
>>><br>
>>> I think the best way to avoid this issue, is to add a DNS entry<br>
>>> providing the addresses<br>
>>> of all the gluster bricks, and use this address for the gluster<br>
>>> storage domain. This way<br>
>>> the glusterfs mount helper can mount the domain even if one of the<br>
>>> gluster bricks<br>
>>> are down.<br>
>>><br>
>>> Again, we will need some magic from the hosted engine developers to<br>
>>> modify the<br>
>>> address of the hosted engine gluster domain on existing system.<br>
>><br>
>><br>
>> Magic won't happen without a bz :-) please open one describing what's<br>
>> requested.<br>
>><br>
>><br>
>>><br>
>>><br>
>>> Nir<br>
>>><br>
>>> ><br>
>>> > Could you add this feature to the next release of 3.6 branch?<br>
>>> ><br>
>>> > Thanks<br>
>>> > Luiz<br>
>>> ><br>
>>> > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <<a href="mailto:sbonazzo@redhat.com" target="_blank">sbonazzo@redhat.com</a>><br>
>>> > escreveu:<br>
>>> >><br>
>>> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <<a href="mailto:dbond@nrggos.com.au" target="_blank">dbond@nrggos.com.au</a>><br>
>>> >> wrote:<br>
>>> >>><br>
>>> >>> My setup is hyperconverged. I have placed my test results in<br>
>>> >>> <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1298693" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1298693</a><br>
>>> >>><br>
>>> >><br>
>>> >> Ok, so you're aware about the limitation of the single point of<br>
>>> >> failure.<br>
>>> >> If you drop the host referenced in hosted engine configuration for the<br>
>>> >> initial setup it won't be able to connect to shared storage even if<br>
>>> >> the<br>
>>> >> other hosts in the cluster are up since the entry point is down.<br>
>>> >> Note that hyperconverged deployment is not supported in 3.6.<br>
>>> >><br>
>>> >><br>
>>> >>><br>
>>> >>><br>
>>> >>> Short description of setup:<br>
>>> >>><br>
>>> >>> 3 hosts with 2 disks each set up with gluster replica 3 across the 6<br>
>>> >>> disks volume name hosted-engine.<br>
>>> >>><br>
>>> >>> Hostname hosted-storage configured in /etc//hosts to point to the<br>
>>> >>> host1.<br>
>>> >>><br>
>>> >>> Installed hosted engine on host1 with the hosted engine storage path<br>
>>> >>> =<br>
>>> >>> hosted-storage:/hosted-engine<br>
>>> >>><br>
>>> >>> Install first engine on h1 successful. Hosts h2 and h3 added to the<br>
>>> >>> hosted engine. All works fine.<br>
>>> >>><br>
>>> >>> Additional storage and non-hosted engine hosts added etc.<br>
>>> >>><br>
>>> >>> Additional VMs added to hosted-engine storage (oVirt Reports VM and<br>
>>> >>> Cinder VM). Additional VM's are hosted by other storage - cinder and<br>
>>> >>> NFS.<br>
>>> >>><br>
>>> >>> The system is in production.<br>
>>> >>><br>
>>> >>><br>
>>> >>> Engine can be migrated around with the web interface.<br>
>>> >>><br>
>>> >>><br>
>>> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is<br>
>>> >>> upgraded<br>
>>> >>> first , new Centos kernel requires host reboot.<br>
>>> >>><br>
>>> >>> - Engine placed on h2 - h3 into maintenance (local) upgrade and<br>
>>> >>> Reboot<br>
>>> >>> h3 - No issues - Local maintenance removed from h3.<br>
>>> >>><br>
>>> >>> - Engine placed on h3 - h2 into maintenance (local) upgrade and<br>
>>> >>> Reboot<br>
>>> >>> h2 - No issues - Local maintenance removed from h2.<br>
>>> >>><br>
>>> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot<br>
>>> >>> h1 -<br>
>>> >>> engine crashes and does not start elsewhere, VM(cinder) on h3 on<br>
>>> >>> same<br>
>>> >>> gluster volume pauses.<br>
>>> >>><br>
>>> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with all<br>
>>> >>> it's<br>
>>> >>> normal BIOS probing)<br>
>>> >>><br>
>>> >>> - Engine starts after h1 comes back and stabilises<br>
>>> >>><br>
>>> >>> - VM(cinder) unpauses itself, VM(reports) continued fine the whole<br>
>>> >>> time.<br>
>>> >>> I can do no diagnosis on the 2 VMs as the engine is not available.<br>
>>> >>><br>
>>> >>> - Local maintenance removed from h1<br>
>>> >>><br>
>>> >>><br>
>>> >>> I don't believe the issue is with gluster itself as the volume<br>
>>> >>> remains<br>
>>> >>> accessible on all hosts during this time albeit with a missing server<br>
>>> >>> (gluster volume status) as each gluster server is rebooted.<br>
>>> >>><br>
>>> >>> Gluster was upgraded as part of the process, no issues were seen<br>
>>> >>> here.<br>
>>> >>><br>
>>> >>><br>
>>> >>> I have been able to duplicate the issue without the upgrade by<br>
>>> >>> following<br>
>>> >>> the same sort of timeline.<br>
>>> >>><br>
>>> >>><br>
>>> >>> ________________________________<br>
>>> >>> From: Sandro Bonazzola <<a href="mailto:sbonazzo@redhat.com" target="_blank">sbonazzo@redhat.com</a>><br>
>>> >>> Sent: Monday, 11 April 2016 7:11 PM<br>
>>> >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak;<br>
>>> >>> Sahina<br>
>>> >>> Bose<br>
>>> >>> Cc: Bond, Darryl; users<br>
>>> >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem<br>
>>> >>><br>
>>> >>><br>
>>> >>><br>
>>> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck<br>
>>> >>> <<a href="mailto:hawk@tbi.univie.ac.at" target="_blank">hawk@tbi.univie.ac.at</a><mailto:<a href="mailto:hawk@tbi.univie.ac.at" target="_blank">hawk@tbi.univie.ac.at</a>>> wrote:<br>
>>> >>> Hi Darryl,<br>
>>> >>><br>
>>> >>> I'm still experimenting with my oVirt installation so I tried to<br>
>>> >>> recreate the problems you've described.<br>
>>> >>><br>
>>> >>> My setup has three HA hosts for virtualization and three machines<br>
>>> >>> for the gluster replica 3 setup.<br>
>>> >>><br>
>>> >>> I manually migrated the Engine from the initial install host (one)<br>
>>> >>> to host three. Then shut down host one manually and interrupted the<br>
>>> >>> fencing mechanisms so the host stayed down. This didn't bother the<br>
>>> >>> Engine VM at all.<br>
>>> >>><br>
>>> >>> Did you move the host one to maintenance before shutting down?<br>
>>> >>> Or is this a crash recovery test?<br>
>>> >>><br>
>>> >>><br>
>>> >>><br>
>>> >>> To make things a bit more challenging I then shut down host three<br>
>>> >>> while running the Engine VM. Of course the Engine was down for some<br>
>>> >>> time until host two detected the problem. It started the Engine VM<br>
>>> >>> and everything seems to be running quite well without the initial<br>
>>> >>> install host.<br>
>>> >>><br>
>>> >>> Thanks for the feedback!<br>
>>> >>><br>
>>> >>><br>
>>> >>><br>
>>> >>> My only problem is that the HA agent on host two and three refuse to<br>
>>> >>> start after a reboot due to the fact that the configuration of the<br>
>>> >>> hosted engine is missing. I wrote another mail to<br>
>>> >>> <a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a><mailto:<a href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a>><br>
>>> >>> about that.<br>
>>> >>><br>
>>> >>> This is weird. Martin, Simone can you please investigate on this?<br>
>>> >>><br>
>>> >>><br>
>>> >>><br>
>>> >>><br>
>>> >>> Cheers<br>
>>> >>> Richard<br>
>>> >>><br>
>>> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote:<br>
>>> >>> > There seems to be a pretty severe bug with using hosted engine on<br>
>>> >>> > gluster.<br>
>>> >>> ><br>
>>> >>> > If the host that was used as the initial hosted-engine --deploy<br>
>>> >>> > host<br>
>>> >>> > goes away, the engine VM wil crash and cannot be restarted until<br>
>>> >>> > the host<br>
>>> >>> > comes back.<br>
>>> >>><br>
>>> >>> is this an Hyperconverged setup?<br>
>>> >>><br>
>>> >>><br>
>>> >>> ><br>
>>> >>> > This is regardless of which host the engine was currently running.<br>
>>> >>> ><br>
>>> >>> ><br>
>>> >>> > The issue seems to be buried in the bowels of VDSM and is not an<br>
>>> >>> > issue<br>
>>> >>> > with gluster itself.<br>
>>> >>><br>
>>> >>> Sahina, can you please investigate on this?<br>
>>> >>><br>
>>> >>><br>
>>> >>> ><br>
>>> >>> > The gluster filesystem is still accessable from the host that was<br>
>>> >>> > running the engine. The issue has been submitted to bugzilla but<br>
>>> >>> > the fix is<br>
>>> >>> > some way off (4.1).<br>
>>> >>> ><br>
>>> >>> ><br>
>>> >>> > Can my hosted engine be converted to use NFS (using the gluster NFS<br>
>>> >>> > server on the same filesystem) without rebuilding my hosted engine<br>
>>> >>> > (ie<br>
>>> >>> > change domainType=glusterfs to domainType=nfs)?<br>
>>> >>><br>
>>> >>> ><br>
>>> >>> > What effect would that have on the hosted-engine storage domain<br>
>>> >>> > inside<br>
>>> >>> > oVirt, ie would the same filesystem be mounted twice or would it<br>
>>> >>> > just break.<br>
>>> >>> ><br>
>>> >>> ><br>
>>> >>> > Will this actually fix the problem, does it have the same issue<br>
>>> >>> > when<br>
>>> >>> > the hosted engine is on NFS?<br>
>>> >>> ><br>
>>> >>> ><br>
>>> >>> > Darryl<br>
>>> >>> ><br>
>>> >>> ><br>
>>> >>> ><br>
>>> >>> ><br>
>>> >>> > ________________________________<br>
>>> >>> ><br>
>>> >>> > The contents of this electronic message and any attachments are<br>
>>> >>> > intended only for the addressee and may contain legally privileged,<br>
>>> >>> > personal, sensitive or confidential information. If you are not the<br>
>>> >>> > intended<br>
>>> >>> > addressee, and have received this email, any transmission,<br>
>>> >>> > distribution,<br>
>>> >>> > downloading, printing or photocopying of the contents of this<br>
>>> >>> > message or<br>
>>> >>> > attachments is strictly prohibited. Any legal privilege or<br>
>>> >>> > confidentiality<br>
>>> >>> > attached to this message and attachments is not waived, lost or<br>
>>> >>> > destroyed by<br>
>>> >>> > reason of delivery to any person other than intended addressee. If<br>
>>> >>> > you have<br>
>>> >>> > received this message and are not the intended addressee you should<br>
>>> >>> > notify<br>
>>> >>> > the sender by return email and destroy all copies of the message<br>
>>> >>> > and any<br>
>>> >>> > attachments. Unless expressly attributed, the views expressed in<br>
>>> >>> > this email<br>
>>> >>> > do not necessarily represent the views of the company.<br>
>>> >>> > _______________________________________________<br>
>>> >>> > Users mailing list<br>
>>> >>> > <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><mailto:<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>><br>
>>> >>> > <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
>>> >>> ><br>
>>> >>><br>
>>> >>><br>
>>> >>> --<br>
>>> >>> /dev/null<br>
>>> >>><br>
>>> >>><br>
>>> >>> _______________________________________________<br>
>>> >>> Users mailing list<br>
>>> >>> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><mailto:<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a>><br>
>>> >>> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
>>> >>><br>
>>> >>><br>
>>> >>><br>
>>> >>><br>
>>> >>> --<br>
>>> >>> Sandro Bonazzola<br>
>>> >>> Better technology. Faster innovation. Powered by community<br>
>>> >>> collaboration.<br>
>>> >>> See how it works at <a href="http://redhat.com" rel="noreferrer" target="_blank">redhat.com</a><<a href="http://redhat.com" rel="noreferrer" target="_blank">http://redhat.com</a>><br>
>>> >><br>
>>> >><br>
>>> >><br>
>>> >><br>
>>> >> --<br>
>>> >> Sandro Bonazzola<br>
>>> >> Better technology. Faster innovation. Powered by community<br>
>>> >> collaboration.<br>
>>> >> See how it works at <a href="http://redhat.com" rel="noreferrer" target="_blank">redhat.com</a><br>
>>> >> _______________________________________________<br>
>>> >> Users mailing list<br>
>>> >> <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
>>> >> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
>>> ><br>
>>> ><br>
>>> > _______________________________________________<br>
>>> > Users mailing list<br>
>>> > <a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
>>> > <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
>>> ><br>
>><br>
>><br>
>><br>
>><br>
>> --<br>
>> Sandro Bonazzola<br>
>> Better technology. Faster innovation. Powered by community collaboration.<br>
>> See how it works at <a href="http://redhat.com" rel="noreferrer" target="_blank">redhat.com</a><br>
><br>
><br>
</div></div></blockquote></div></div></div><div dir="ltr"><div class="gmail_extra"><br><br clear="all"><div><br></div>-- <br><div><div dir="ltr"><div><div dir="ltr">Sandro Bonazzola<br>Better technology. Faster innovation. Powered by community collaboration.<br>See how it works at <a href="http://redhat.com" target="_blank">redhat.com</a><br></div></div></div></div>
</div></div></blockquote></div></div>