Hosted engine on gluster problem

Bond, Darryl

8 Apr 2016 8 Apr '16

2:38 a.m.

There seems to be a pretty severe bug with using hosted engine on gluster. If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back. This is regardless of which host the engine was currently running. The issue seems to be buried in the bowels of VDSM and is not an issue with gluster itself. The gluster filesystem is still accessable from the host that was running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1). Can my hosted engine be converted to use NFS (using the gluster NFS server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)? What effect would that have on the hosted-engine storage domain inside oVirt, ie would the same filesystem be mounted twice or would it just break. Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS? Darryl ________________________________ The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company.

Show replies by date

Richard Neuboeck

11 Apr 11 Apr

10:37 a.m.

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --7mD4QVq6WlIRKolvbnT2emIR0hDMs7H5T Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Darryl, I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described. My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup. I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all. To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host. My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org about that. Cheers Richard On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...

There seems to be a pretty severe bug with using hosted engine on glust= er. =20 If the host that was used as the initial hosted-engine --deploy host go= es away, the engine VM wil crash and cannot be restarted until the host c= omes back. =20 This is regardless of which host the engine was currently running. =20 =20 The issue seems to be buried in the bowels of VDSM and is not an issue = with gluster itself. =20 The gluster filesystem is still accessable from the host that was runni= ng the engine. The issue has been submitted to bugzilla but the fix is so= me way off (4.1). =20 =20 Can my hosted engine be converted to use NFS (using the gluster NFS ser= ver on the same filesystem) without rebuilding my hosted engine (ie chang= e domainType=3Dglusterfs to domainType=3Dnfs)? =20 What effect would that have on the hosted-engine storage domain inside = oVirt, ie would the same filesystem be mounted twice or would it just bre= ak. =20 =20 Will this actually fix the problem, does it have the same issue when th= e hosted engine is on NFS? =20 =20 Darryl =20 =20 =20 =20 ________________________________ =20 The contents of this electronic message and any attachments are intende= d only for the addressee and may contain legally privileged, personal, se= nsitive or confidential information. If you are not the intended addresse= e, and have received this email, any transmission, distribution, download= ing, printing or photocopying of the contents of this message or attachme= nts is strictly prohibited. Any legal privilege or confidentiality attach= ed to this message and attachments is not waived, lost or destroyed by re= ason of delivery to any person other than intended addressee. If you have= received this message and are not the intended addressee you should noti= fy the sender by return email and destroy all copies of the message and a= ny attachments. Unless expressly attributed, the views expressed in this = email do not necessarily represent the views of the company. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users =20

--=20 /dev/null --7mD4QVq6WlIRKolvbnT2emIR0hDMs7H5T Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCgAGBQJXC1QvAAoJEA7XCanqEVqIrdsP/iCl11SRaKXfbDUxkU+WvE9s njChHISJmeNHnmm7bw1OLeTmG3kaVD6rzuGoD7gUBoc3hSsmXg/4qvtNIVxEuB2G Qvi0jMKLV6VqBxRAhhdTocfR0xRa3a0p5cDY2belt4XZF5MxzU6tkgm6cD/k0/6y 5bIUMJlTTujToiNWDPMgL1teXHJTt2VanEOh7sQRJqJOfJ/k8UzDXesF2D4hognV BvHMMQvtNe3+Lxf8/YP3kXV6yCx9sPnEpyyu6n3nFd8UpnAnw3cZSx8IUe0kCkqL yv/ETyLaYaBM5ezyPh2cvtULC67L31mdldVD0LCJA7Kkitz08Rmu0boSEO+6Q3VM A0gPiEApJpgyMe6djqN2pX/BLcFwab8qMzAr4OXlV0TA/ckvPTjvaOlJYcJf0cC2 p3ox+mG4qS3g/Y0HVvcgg0uhJTrZFPfTB67dCwo676IOZKkXvpIonnmPg2oGyVrc KhWI6+aDH1AlyYwZUqmQiGb4V7mnQJHlJZwIMshPCjLn1igS/tD8OeYmEjmQ5ru2 3PzAIHtGvs0ZJLeiJwY6RDoLXBdI3cyGImhl5nuozsYrHbYCQcIsPgASvLAK5IFU 2zq9gi7Zme5mPVwxgv8iYGT2mrV/9eVaC7SL/RPKCdHtn/S1p60NUjvKbopLG8tf 1iD1vKBEjLRPSRD7USQC =IJeZ -----END PGP SIGNATURE----- --7mD4QVq6WlIRKolvbnT2emIR0hDMs7H5T--

Sandro Bonazzola

12:11 p.m.

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at> wrote:

...

Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

...

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

...

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org about that.

This is weird. Martin, Simone can you please investigate on this?

...

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...
There seems to be a pretty severe bug with using hosted engine on gluster.

If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back.

is this an Hyperconverged setup?

...

...
This is regardless of which host the engine was currently running.

The issue seems to be buried in the bowels of VDSM and is not an issue

with gluster itself.

Sahina, can you please investigate on this?

...

...
The gluster filesystem is still accessable from the host that was

running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1).

...
Can my hosted engine be converted to use NFS (using the gluster NFS

server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

...

...
What effect would that have on the hosted-engine storage domain inside oVirt, ie would the same filesystem be mounted twice or would it just break.

Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS?

Darryl

________________________________

The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Richard Neuboeck

12:51 p.m.

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --Q0sO4goBhg8xBgvnFVnf2fqMVPKQgEBNP Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 04/11/2016 11:11 AM, Sandro Bonazzola wrote:

...

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at <mailto:hawk@tbi.univie.ac.at>> wrote: =20 Hi Darryl, =20 I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described. =20 My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup. =20 I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the=

...

fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all. =20 =20 Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

I did not put the machine in maintenance mode. Just issued 'poweroff' on the command line. Same for host one as host three.

...

=20 To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some=

...

time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host. =20 =20 Thanks for the feedback! =20 =20 =20 =20 My only problem is that the HA agent on host two and three refuse t=

...

start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org <mailto:users@ovirt.org> about that. =20 =20 This is weird. Martin, Simone can you please investigate on this? =20 =20 =20 =20 =20 Cheers Richard =20 On 04/08/2016 01:38 AM, Bond, Darryl wrote: > There seems to be a pretty severe bug with using hosted engine on gluster. > > If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back. =20 =20 is this an Hyperconverged setup? =20 =20 =20 > > This is regardless of which host the engine was currently running=

=2E

...

> > > The issue seems to be buried in the bowels of VDSM and is not an issue with gluster itself. =20 =20 Sahina, can you please investigate on this? =20 =20 =20 > > The gluster filesystem is still accessable from the host that was running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1). > > > Can my hosted engine be converted to use NFS (using the gluster NFS server on the same filesystem) without rebuilding my hosted engine (ie change domainType=3Dglusterfs to domainType=3Dnfs=

...

=20 =20 =20 > > What effect would that have on the hosted-engine storage domain inside oVirt, ie would the same filesystem be mounted twice or would it just break. > > > Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS? > > > Darryl > > > > > ________________________________ > > The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company. > _______________________________________________ > Users mailing list > Users@ovirt.org <mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users > =20 =20 -- /dev/null =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users =20 =20 =20 =20 --=20 Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com <http://redhat.com>

--=20 /dev/null --Q0sO4goBhg8xBgvnFVnf2fqMVPKQgEBNP Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCgAGBQJXC3OxAAoJEA7XCanqEVqI3DcP/1Ddn2joCkH0Cx2i9Wu5Gv6G X4iSmJpTpz7m+HTmP8ethA67EWOHl8DU8W/jSZVV/Zta6JBtrx5iCIODj81YSDFs kaEJGwSAAIr4zgb37t6apfFH5kd1IXIVaErxf1NqlMcuqxnkNMs5wP01Ae+tdMmO mo54QSmETBGN6zbhuWHUBUt0Hk6D9UUj+/dMne1knQkUft9rUNJhg5GkmfrmERYP JU7MqLx+kfAUSUUujgegAIpEgGCl4aQBBtZ3qumK46vSyQ7JILWzIMVfZDJD3lzA TW97oLMsTPzcePrqDykvnhD8mOu4qoTeoS75S5EdIA+37QTDZPFzvU3fv85bPVNq Z2+uvsxEiPvNTLmyWdBO3xl6qDL8UKBdV996XFIKKrCLYXUFlp/MpCAyqtGR+657 iR7e0AW/wIlROCcdYDFkFcTvVfG16gKUPyyrGDBzkzq8X7rESjQ3aR9/GoMoj432 jTOX8Dm7Z22o/WbWoB/CWQEqa6erPx20ClnjJ/O+rwOqd4i3SeQBaiHkvdN/bFOz 7EFyTL2O01GUmvreUJLiPOFupnFkzlQFYYt0zdmHjuv80soxHRgxBcY1Yfe/AOnm TaeLHFE6kbgWzTzkm4ftsIWhOwHVzUMOyB0px3whAXQE3OmI65j7jK+ulQusoMGx mN5bt7I+ewxnFCc1fonj =iHrB -----END PGP SIGNATURE----- --Q0sO4goBhg8xBgvnFVnf2fqMVPKQgEBNP--

Bond, Darryl

12 Apr 12 Apr

12:44 a.m.

My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693 Short description of setup: 3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine. Hostname hosted-storage configured in /etc//hosts to point to the host1. Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine. Additional storage and non-hosted engine hosts added etc. Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS. The system is in production. Engine can be migrated around with the web interface. - 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot. - Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3. - Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2. - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses. - Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing) - Engine starts after h1 comes back and stabilises - VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available. - Local maintenance removed from h1 I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted. Gluster was upgraded as part of the process, no issues were seen here. I have been able to duplicate the issue without the upgrade by following the same sort of timeline. ________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl, I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described. My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup. I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all. Did you move the host one to maintenance before shutting down? Or is this a crash recovery test? To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host. Thanks for the feedback! My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org<mailto:users@ovirt.org> about that. This is weird. Martin, Simone can you please investigate on this? Cheers Richard On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...

There seems to be a pretty severe bug with using hosted engine on gluster.

If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back.

is this an Hyperconverged setup?

...

This is regardless of which host the engine was currently running.

The issue seems to be buried in the bowels of VDSM and is not an issue with gluster itself.

Sahina, can you please investigate on this?

...

The gluster filesystem is still accessable from the host that was running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1).

Can my hosted engine be converted to use NFS (using the gluster NFS server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

...

What effect would that have on the hosted-engine storage domain inside oVirt, ie would the same filesystem be mounted twice or would it just break.

Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS?

Darryl

________________________________

The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company. _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- /dev/null _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com<http://redhat.com>

Sandro Bonazzola

11:03 a.m.

On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote:

...

My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

...

Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the host1.

Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS.

The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at <mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org<mailto: users@ovirt.org> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...
There seems to be a pretty severe bug with using hosted engine on gluster.

If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back.

is this an Hyperconverged setup?

...
This is regardless of which host the engine was currently running.

The issue seems to be buried in the bowels of VDSM and is not an issue

with gluster itself.

Sahina, can you please investigate on this?

...
The gluster filesystem is still accessable from the host that was

running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1).

...
Can my hosted engine be converted to use NFS (using the gluster NFS

server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

...
What effect would that have on the hosted-engine storage domain inside

oVirt, ie would the same filesystem be mounted twice or would it just break.

...
Will this actually fix the problem, does it have the same issue when the

hosted engine is on NFS?

...
Darryl

________________________________

The contents of this electronic message and any attachments are intended

only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company.

...
_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com<http://redhat.com>

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Luiz Claudio Prazeres Goncalves

3:05 p.m.

Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants to use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests. Could you add this feature to the next release of 3.6 branch? Thanks Luiz Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...

On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote:

...
My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

...
Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the host1.

Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS.

The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at <mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org<mailto: users@ovirt.org> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...
There seems to be a pretty severe bug with using hosted engine on gluster.

If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back.

is this an Hyperconverged setup?

...
This is regardless of which host the engine was currently running.

The issue seems to be buried in the bowels of VDSM and is not an issue

with gluster itself.

Sahina, can you please investigate on this?

...
The gluster filesystem is still accessable from the host that was

running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1).

...
Can my hosted engine be converted to use NFS (using the gluster NFS

server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

...
What effect would that have on the hosted-engine storage domain inside

oVirt, ie would the same filesystem be mounted twice or would it just break.

...
Will this actually fix the problem, does it have the same issue when

the hosted engine is on NFS?

...
Darryl

________________________________

The contents of this electronic message and any attachments are

intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company.

...
_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com<http://redhat.com>

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Nir Soffer

7:47 p.m.

On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...

Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants to use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution? Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down. If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on. I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down. Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system. Nir

...

Could you add this feature to the next release of 3.6 branch?

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote:

...
My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

...
Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the host1.

Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS.

The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org<mailto:users@ovirt.org> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...
There seems to be a pretty severe bug with using hosted engine on gluster.

If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back.

is this an Hyperconverged setup?

...
This is regardless of which host the engine was currently running.

The issue seems to be buried in the bowels of VDSM and is not an issue with gluster itself.

Sahina, can you please investigate on this?

...
The gluster filesystem is still accessable from the host that was running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1).

Can my hosted engine be converted to use NFS (using the gluster NFS server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

...
What effect would that have on the hosted-engine storage domain inside oVirt, ie would the same filesystem be mounted twice or would it just break.

Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS?

Darryl

________________________________

The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company. _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com<http://redhat.com>

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Sandro Bonazzola

13 Apr 13 Apr

10:15 a.m.

On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

...
Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants to use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting

On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote: the

...
engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution?

Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down.

If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on.

I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down.

Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

...

Nir

...
Could you add this feature to the next release of 3.6 branch?

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote:

...
My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

...
Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the

...
...
...
Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and

NFS.

...
The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot

...
...
...
engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org<mailto:users@ovirt.org> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...
There seems to be a pretty severe bug with using hosted engine on gluster.

If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back.

is this an Hyperconverged setup?

...
This is regardless of which host the engine was currently running.

The issue seems to be buried in the bowels of VDSM and is not an

issue

...
with gluster itself.

Sahina, can you please investigate on this?

...
The gluster filesystem is still accessable from the host that was running the engine. The issue has been submitted to bugzilla but the

fix is

...
some way off (4.1).

Can my hosted engine be converted to use NFS (using the gluster NFS server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

...
What effect would that have on the hosted-engine storage domain

inside

...
oVirt, ie would the same filesystem be mounted twice or would it just break.

Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS?

Darryl

________________________________

The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in

host1. h1 - this email

...
...
...
...
do not necessarily represent the views of the company. _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community

collaboration. >>> See how it works at redhat.com<http://redhat.com> >> >> >> >> >> -- >> Sandro Bonazzola >> Better technology. Faster innovation. Powered by community collaboration. >> See how it works at redhat.com >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Luiz Claudio Prazeres Goncalves

4:34 p.m.

Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693 When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from the UI you can, like the attached screen shot. In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693. for example: Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com ,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log Sandro, it seems to me a simple solution which can be easily fixed. What do you think? Regards -Luiz 2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:

...

On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants to use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution?

Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down.

If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on.

I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down.

Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

...
Nir

...
Could you add this feature to the next release of 3.6 branch?

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote:

...
My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of

...
...
If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

...
Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the

host1.

...
Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and

NFS.

...
The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and

Reboot

...
h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot

...
...
...
engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org<mailto:users@ovirt.org> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...
There seems to be a pretty severe bug with using hosted engine on gluster.

If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until

...
...
...
...
comes back.

is this an Hyperconverged setup?

...
This is regardless of which host the engine was currently running.

The issue seems to be buried in the bowels of VDSM and is not an

issue

...
with gluster itself.

Sahina, can you please investigate on this?

...
The gluster filesystem is still accessable from the host that was running the engine. The issue has been submitted to bugzilla but

...
...
...
...
some way off (4.1).

Can my hosted engine be converted to use NFS (using the gluster NFS server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

...
What effect would that have on the hosted-engine storage domain

inside

...
oVirt, ie would the same filesystem be mounted twice or would it just break.

Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS?

Darryl

________________________________

The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in

failure. h1 - the host the fix is this email

...
...
...
...
do not necessarily represent the views of the company. _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community

collaboration. >>> See how it works at redhat.com<http://redhat.com> >> >> >> >> >> -- >> Sandro Bonazzola >> Better technology. Faster innovation. Powered by community collaboration. >> See how it works at redhat.com >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Luiz Claudio Prazeres Goncalves

14 Apr 14 Apr

8:07 p.m.

Sandro, any word here? Btw, I'm not talking about hyperconvergency in this case, but 3 external gluster nodes using replica 3 Regards Luiz Em qua, 13 de abr de 2016 10:34, Luiz Claudio Prazeres Goncalves < luizcpg@gmail.com> escreveu:

...

Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693

When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from the UI you can, like the attached screen shot.

In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693.

for example:

Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine

Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com ,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log

Sandro, it seems to me a simple solution which can be easily fixed.

What do you think?

Regards -Luiz

2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:

...
On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants to use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution?

Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down.

If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on.

I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down.

Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

...
Nir

...
Could you add this feature to the next release of 3.6 branch?

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <sbonazzo@redhat.com

escreveu:

...
On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote:

...
My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of

...
...
If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if

...
...
other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

...
Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the

host1.

...
Installed hosted engine on host1 with the hosted engine storage path

=

...
hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS.

The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot

...
...
...
engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org<mailto:users@ovirt.org> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote: > There seems to be a pretty severe bug with using hosted engine on > gluster. > > If the host that was used as the initial hosted-engine --deploy host > goes away, the engine VM wil crash and cannot be restarted until

...
...
...
> comes back.

is this an Hyperconverged setup?

> > This is regardless of which host the engine was currently running. > > > The issue seems to be buried in the bowels of VDSM and is not an issue > with gluster itself.

Sahina, can you please investigate on this?

> > The gluster filesystem is still accessable from the host that was > running the engine. The issue has been submitted to bugzilla but

...
...
...
> some way off (4.1). > > > Can my hosted engine be converted to use NFS (using the gluster NFS > server on the same filesystem) without rebuilding my hosted engine (ie > change domainType=glusterfs to domainType=nfs)?

> > What effect would that have on the hosted-engine storage domain inside > oVirt, ie would the same filesystem be mounted twice or would it just break. > > > Will this actually fix the problem, does it have the same issue when > the hosted engine is on NFS? > > > Darryl > > > > > ________________________________ > > The contents of this electronic message and any attachments are > intended only for the addressee and may contain legally privileged, > personal, sensitive or confidential information. If you are not

...
...
...
> addressee, and have received this email, any transmission, distribution, > downloading, printing or photocopying of the contents of this message or > attachments is strictly prohibited. Any legal privilege or confidentiality > attached to this message and attachments is not waived, lost or destroyed by > reason of delivery to any person other than intended addressee. If you have > received this message and are not the intended addressee you should notify > the sender by return email and destroy all copies of the message and any > attachments. Unless expressly attributed, the views expressed in

failure. the h1 - the host the fix is the intended this email

...
...
...
> do not necessarily represent the views of the company. > _______________________________________________ > Users mailing list > Users@ovirt.org<mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users >

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community

collaboration. >>> See how it works at redhat.com<http://redhat.com> >> >> >> >> >> -- >> Sandro Bonazzola >> Better technology. Faster innovation. Powered by community collaboration. >> See how it works at redhat.com >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Sandro Bonazzola

15 Apr 15 Apr

9:04 a.m.

On Thu, Apr 14, 2016 at 7:07 PM, Luiz Claudio Prazeres Goncalves < luizcpg@gmail.com> wrote:

...

Sandro, any word here? Btw, I'm not talking about hyperconvergency in this case, but 3 external gluster nodes using replica 3

Regards Luiz

Em qua, 13 de abr de 2016 10:34, Luiz Claudio Prazeres Goncalves < luizcpg@gmail.com> escreveu:

...
Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693

When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from the UI you can, like the attached screen shot.

In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693.

for example:

Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine

Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com ,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log

Sandro, it seems to me a simple solution which can be easily fixed.

What do you think?

The whole integration team is currently busy and we have not enough resources to handle this for 3.6.6 time frame. We'll be happy to help reviewing patches but we have more urgent items to handle right now.

...

...
Regards -Luiz

2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:

...
On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants to use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution?

Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down.

If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on.

I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down.

Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

...
Nir

...
Could you add this feature to the next release of 3.6 branch?

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <

...
escreveu:

...
On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote: > > My setup is hyperconverged. I have placed my test results in > https://bugzilla.redhat.com/show_bug.cgi?id=1298693 >

Ok, so you're aware about the limitation of the single point of

failure.

...
If you drop the host referenced in hosted engine configuration for

...
...
initial setup it won't be able to connect to shared storage even if

...
...
other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

> > > Short description of setup: > > 3 hosts with 2 disks each set up with gluster replica 3 across the 6 > disks volume name hosted-engine. > > Hostname hosted-storage configured in /etc//hosts to point to the host1. > > Installed hosted engine on host1 with the hosted engine storage

...
...
> hosted-storage:/hosted-engine > > Install first engine on h1 successful. Hosts h2 and h3 added to the > hosted engine. All works fine. > > Additional storage and non-hosted engine hosts added etc. > > Additional VMs added to hosted-engine storage (oVirt Reports VM and > Cinder VM). Additional VM's are hosted by other storage - cinder and NFS. > > The system is in production. > > > Engine can be migrated around with the web interface. > > > - 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded > first , new Centos kernel requires host reboot. > > - Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot > h3 - No issues - Local maintenance removed from h3. > > - Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot > h2 - No issues - Local maintenance removed from h2. > > - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - > engine crashes and does not start elsewhere, VM(cinder) on h3 on same > gluster volume pauses. > > - Host 1 takes about 5 minutes to reboot (Enterprise box with all it's > normal BIOS probing) > > - Engine starts after h1 comes back and stabilises > > - VM(cinder) unpauses itself, VM(reports) continued fine the whole time. > I can do no diagnosis on the 2 VMs as the engine is not available. > > - Local maintenance removed from h1 > > > I don't believe the issue is with gluster itself as the volume remains > accessible on all hosts during this time albeit with a missing server > (gluster volume status) as each gluster server is rebooted. > > Gluster was upgraded as part of the process, no issues were seen here. > > > I have been able to duplicate the issue without the upgrade by following > the same sort of timeline. > > > ________________________________ > From: Sandro Bonazzola <sbonazzo@redhat.com> > Sent: Monday, 11 April 2016 7:11 PM > To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina > Bose > Cc: Bond, Darryl; users > Subject: Re: [ovirt-users] Hosted engine on gluster problem > > > > On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck > <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: > Hi Darryl, > > I'm still experimenting with my oVirt installation so I tried to > recreate the problems you've described. > > My setup has three HA hosts for virtualization and three machines > for the gluster replica 3 setup. > > I manually migrated the Engine from the initial install host (one) > to host three. Then shut down host one manually and interrupted the > fencing mechanisms so the host stayed down. This didn't bother the > Engine VM at all. > > Did you move the host one to maintenance before shutting down? > Or is this a crash recovery test? > > > > To make things a bit more challenging I then shut down host three > while running the Engine VM. Of course the Engine was down for some > time until host two detected the problem. It started the Engine VM > and everything seems to be running quite well without the initial > install host. > > Thanks for the feedback! > > > > My only problem is that the HA agent on host two and three refuse to > start after a reboot due to the fact that the configuration of the > hosted engine is missing. I wrote another mail to > users@ovirt.org<mailto:users@ovirt.org> > about that. > > This is weird. Martin, Simone can you please investigate on this? > > > > > Cheers > Richard > > On 04/08/2016 01:38 AM, Bond, Darryl wrote: > > There seems to be a pretty severe bug with using hosted engine on > > gluster. > > > > If the host that was used as the initial hosted-engine --deploy host > > goes away, the engine VM wil crash and cannot be restarted until

...
...
> > comes back. > > is this an Hyperconverged setup? > > > > > > This is regardless of which host the engine was currently running. > > > > > > The issue seems to be buried in the bowels of VDSM and is not an issue > > with gluster itself. > > Sahina, can you please investigate on this? > > > > > > The gluster filesystem is still accessable from the host that was > > running the engine. The issue has been submitted to bugzilla but

...
...
> > some way off (4.1). > > > > > > Can my hosted engine be converted to use NFS (using the gluster NFS > > server on the same filesystem) without rebuilding my hosted engine (ie > > change domainType=glusterfs to domainType=nfs)? > > > > > What effect would that have on the hosted-engine storage domain inside > > oVirt, ie would the same filesystem be mounted twice or would it just break. > > > > > > Will this actually fix the problem, does it have the same issue when > > the hosted engine is on NFS? > > > > > > Darryl > > > > > > > > > > ________________________________ > > > > The contents of this electronic message and any attachments are > > intended only for the addressee and may contain legally

...
...
> > personal, sensitive or confidential information. If you are not

...
...
> > addressee, and have received this email, any transmission, distribution, > > downloading, printing or photocopying of the contents of this message or > > attachments is strictly prohibited. Any legal privilege or confidentiality > > attached to this message and attachments is not waived, lost or destroyed by > > reason of delivery to any person other than intended addressee. If you have > > received this message and are not the intended addressee you should notify > > the sender by return email and destroy all copies of the message and any > > attachments. Unless expressly attributed, the views expressed in

sbonazzo@redhat.com> the the path = the host the fix is privileged, the intended this email

...
...
> > do not necessarily represent the views of the company. > > _______________________________________________ > > Users mailing list > > Users@ovirt.org<mailto:Users@ovirt.org> > > http://lists.ovirt.org/mailman/listinfo/users > > > > > -- > /dev/null > > > _______________________________________________ > Users mailing list > Users@ovirt.org<mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users > > > > > -- > Sandro Bonazzola > Better technology. Faster innovation. Powered by community

collaboration. >>> See how it works at redhat.com<http://redhat.com> >> >> >> >> >> -- >> Sandro Bonazzola >> Better technology. Faster innovation. Powered by community collaboration. >> See how it works at redhat.com >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Nir Soffer

14 Apr 14 Apr

8:35 p.m.

On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...

Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693

When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from the UI you can, like the attached screen shot.

In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693.

for example:

Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine

I agree, this feature must be supported.

...

Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log

Sandro, it seems to me a simple solution which can be easily fixed.

What do you think?

Regards -Luiz

2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:

...
On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants to use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution?

Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down.

If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on.

I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down.

Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

...
Nir

...
Could you add this feature to the next release of 3.6 branch?

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote:

...
My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

...
Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the host1.

Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS.

The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org<mailto:users@ovirt.org> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote: > There seems to be a pretty severe bug with using hosted engine on > gluster. > > If the host that was used as the initial hosted-engine --deploy > host > goes away, the engine VM wil crash and cannot be restarted until > the host > comes back.

is this an Hyperconverged setup?

> > This is regardless of which host the engine was currently running. > > > The issue seems to be buried in the bowels of VDSM and is not an > issue > with gluster itself.

Sahina, can you please investigate on this?

> > The gluster filesystem is still accessable from the host that was > running the engine. The issue has been submitted to bugzilla but > the fix is > some way off (4.1). > > > Can my hosted engine be converted to use NFS (using the gluster NFS > server on the same filesystem) without rebuilding my hosted engine > (ie > change domainType=glusterfs to domainType=nfs)?

> > What effect would that have on the hosted-engine storage domain > inside > oVirt, ie would the same filesystem be mounted twice or would it > just break. > > > Will this actually fix the problem, does it have the same issue > when > the hosted engine is on NFS? > > > Darryl > > > > > ________________________________ > > The contents of this electronic message and any attachments are > intended only for the addressee and may contain legally privileged, > personal, sensitive or confidential information. If you are not the > intended > addressee, and have received this email, any transmission, > distribution, > downloading, printing or photocopying of the contents of this > message or > attachments is strictly prohibited. Any legal privilege or > confidentiality > attached to this message and attachments is not waived, lost or > destroyed by > reason of delivery to any person other than intended addressee. If > you have > received this message and are not the intended addressee you should > notify > the sender by return email and destroy all copies of the message > and any > attachments. Unless expressly attributed, the views expressed in > this email > do not necessarily represent the views of the company. > _______________________________________________ > Users mailing list > Users@ovirt.org<mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users >

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com<http://redhat.com>

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Sandro Bonazzola

15 Apr 15 Apr

9:05 a.m.

On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693

When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from the UI you can, like the attached screen shot.

In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693.

for example:

Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine

I agree, this feature must be supported.

It will, and it's currently targeted to 4.0.

...

...
Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com ,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log

Sandro, it seems to me a simple solution which can be easily fixed.

What do you think?

Regards -Luiz

2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:

...
On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants

to

...
...
...
use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution?

Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down.

If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on.

I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down.

Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

...
Nir

...
Could you add this feature to the next release of 3.6 branch?

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <

sbonazzo@redhat.com>

...
...
escreveu:

...
On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au

...
...
...
...
wrote: > > My setup is hyperconverged. I have placed my test results in > https://bugzilla.redhat.com/show_bug.cgi?id=1298693 >

Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

> > > Short description of setup: > > 3 hosts with 2 disks each set up with gluster replica 3 across the 6 > disks volume name hosted-engine. > > Hostname hosted-storage configured in /etc//hosts to point to the > host1. > > Installed hosted engine on host1 with the hosted engine storage path > = > hosted-storage:/hosted-engine > > Install first engine on h1 successful. Hosts h2 and h3 added to the > hosted engine. All works fine. > > Additional storage and non-hosted engine hosts added etc. > > Additional VMs added to hosted-engine storage (oVirt Reports VM and > Cinder VM). Additional VM's are hosted by other storage - cinder and > NFS. > > The system is in production. > > > Engine can be migrated around with the web interface. > > > - 3.6.4 upgrade released, follow the upgrade guide, engine is > upgraded > first , new Centos kernel requires host reboot. > > - Engine placed on h2 - h3 into maintenance (local) upgrade and > Reboot > h3 - No issues - Local maintenance removed from h3. > > - Engine placed on h3 - h2 into maintenance (local) upgrade and > Reboot > h2 - No issues - Local maintenance removed from h2. > > - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot > h1 - > engine crashes and does not start elsewhere, VM(cinder) on h3 on > same > gluster volume pauses. > > - Host 1 takes about 5 minutes to reboot (Enterprise box with all > it's > normal BIOS probing) > > - Engine starts after h1 comes back and stabilises > > - VM(cinder) unpauses itself, VM(reports) continued fine the whole > time. > I can do no diagnosis on the 2 VMs as the engine is not available. > > - Local maintenance removed from h1 > > > I don't believe the issue is with gluster itself as the volume > remains > accessible on all hosts during this time albeit with a missing server > (gluster volume status) as each gluster server is rebooted. > > Gluster was upgraded as part of the process, no issues were seen > here. > > > I have been able to duplicate the issue without the upgrade by > following > the same sort of timeline. > > > ________________________________ > From: Sandro Bonazzola <sbonazzo@redhat.com> > Sent: Monday, 11 April 2016 7:11 PM > To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; > Sahina > Bose > Cc: Bond, Darryl; users > Subject: Re: [ovirt-users] Hosted engine on gluster problem > > > > On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck > <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: > Hi Darryl, > > I'm still experimenting with my oVirt installation so I tried to > recreate the problems you've described. > > My setup has three HA hosts for virtualization and three machines > for the gluster replica 3 setup. > > I manually migrated the Engine from the initial install host (one) > to host three. Then shut down host one manually and interrupted the > fencing mechanisms so the host stayed down. This didn't bother the > Engine VM at all. > > Did you move the host one to maintenance before shutting down? > Or is this a crash recovery test? > > > > To make things a bit more challenging I then shut down host three > while running the Engine VM. Of course the Engine was down for some > time until host two detected the problem. It started the Engine VM > and everything seems to be running quite well without the initial > install host. > > Thanks for the feedback! > > > > My only problem is that the HA agent on host two and three refuse to > start after a reboot due to the fact that the configuration of the > hosted engine is missing. I wrote another mail to > users@ovirt.org<mailto:users@ovirt.org> > about that. > > This is weird. Martin, Simone can you please investigate on this? > > > > > Cheers > Richard > > On 04/08/2016 01:38 AM, Bond, Darryl wrote: > > There seems to be a pretty severe bug with using hosted engine on > > gluster. > > > > If the host that was used as the initial hosted-engine --deploy > > host > > goes away, the engine VM wil crash and cannot be restarted until > > the host > > comes back. > > is this an Hyperconverged setup? > > > > > > This is regardless of which host the engine was currently running. > > > > > > The issue seems to be buried in the bowels of VDSM and is not an > > issue > > with gluster itself. > > Sahina, can you please investigate on this? > > > > > > The gluster filesystem is still accessable from the host that was > > running the engine. The issue has been submitted to bugzilla but > > the fix is > > some way off (4.1). > > > > > > Can my hosted engine be converted to use NFS (using the gluster NFS > > server on the same filesystem) without rebuilding my hosted engine > > (ie > > change domainType=glusterfs to domainType=nfs)? > > > > > What effect would that have on the hosted-engine storage domain > > inside > > oVirt, ie would the same filesystem be mounted twice or would it > > just break. > > > > > > Will this actually fix the problem, does it have the same issue > > when > > the hosted engine is on NFS? > > > > > > Darryl > > > > > > > > > > ________________________________ > > > > The contents of this electronic message and any attachments are > > intended only for the addressee and may contain legally privileged, > > personal, sensitive or confidential information. If you are not the > > intended > > addressee, and have received this email, any transmission, > > distribution, > > downloading, printing or photocopying of the contents of this > > message or > > attachments is strictly prohibited. Any legal privilege or > > confidentiality > > attached to this message and attachments is not waived, lost or > > destroyed by > > reason of delivery to any person other than intended addressee. If > > you have > > received this message and are not the intended addressee you should > > notify > > the sender by return email and destroy all copies of the message > > and any > > attachments. Unless expressly attributed, the views expressed in > > this email > > do not necessarily represent the views of the company. > > _______________________________________________ > > Users mailing list > > Users@ovirt.org<mailto:Users@ovirt.org> > > http://lists.ovirt.org/mailman/listinfo/users > > > > > -- > /dev/null > > > _______________________________________________ > Users mailing list > Users@ovirt.org<mailto:Users@ovirt.org> > http://lists.ovirt.org/mailman/listinfo/users > > > > > -- > Sandro Bonazzola > Better technology. Faster innovation. Powered by community > collaboration. > See how it works at redhat.com<http://redhat.com>

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community

collaboration. >> See how it works at redhat.com > >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Luiz Claudio Prazeres Goncalves

4 p.m.

I'm not planning to move to ovirt 4 until it gets stable, so would be great to backport to 3.6 or ,ideally, gets developed on the next release of 3.6 branch. Considering the urgency (its a single point of failure) x complexity wouldn't be hard to make the proposed fix. I'm using today a production environment on top of gluster replica 3 and this is the only SPF I have. Thanks Luiz Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...

On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693

When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from the UI you can, like the attached screen shot.

In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693.

for example:

Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine

I agree, this feature must be supported.

It will, and it's currently targeted to 4.0.

...
...
Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com ,fetch-attempts=3,log-level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log

Sandro, it seems to me a simple solution which can be easily fixed.

What do you think?

Regards -Luiz

2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:

...
On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com>

...
...
On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants

to

...
...
use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution?

Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down.

If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on.

I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down.

Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

...
Nir

...
Could you add this feature to the next release of 3.6 branch?

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <

sbonazzo@redhat.com>

...
...
escreveu: > > On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl < dbond@nrggos.com.au> > wrote: >> >> My setup is hyperconverged. I have placed my test results in >> https://bugzilla.redhat.com/show_bug.cgi?id=1298693 >> > > Ok, so you're aware about the limitation of the single point of > failure. > If you drop the host referenced in hosted engine configuration for

...
...
...
> initial setup it won't be able to connect to shared storage even if > the > other hosts in the cluster are up since the entry point is down. > Note that hyperconverged deployment is not supported in 3.6. > > >> >> >> Short description of setup: >> >> 3 hosts with 2 disks each set up with gluster replica 3 across

...
...
...
>> disks volume name hosted-engine. >> >> Hostname hosted-storage configured in /etc//hosts to point to the >> host1. >> >> Installed hosted engine on host1 with the hosted engine storage

...
...
...
>> = >> hosted-storage:/hosted-engine >> >> Install first engine on h1 successful. Hosts h2 and h3 added to

...
...
...
>> hosted engine. All works fine. >> >> Additional storage and non-hosted engine hosts added etc. >> >> Additional VMs added to hosted-engine storage (oVirt Reports VM and >> Cinder VM). Additional VM's are hosted by other storage - cinder and >> NFS. >> >> The system is in production. >> >> >> Engine can be migrated around with the web interface. >> >> >> - 3.6.4 upgrade released, follow the upgrade guide, engine is >> upgraded >> first , new Centos kernel requires host reboot. >> >> - Engine placed on h2 - h3 into maintenance (local) upgrade and >> Reboot >> h3 - No issues - Local maintenance removed from h3. >> >> - Engine placed on h3 - h2 into maintenance (local) upgrade and >> Reboot >> h2 - No issues - Local maintenance removed from h2. >> >> - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot >> h1 - >> engine crashes and does not start elsewhere, VM(cinder) on h3 on >> same >> gluster volume pauses. >> >> - Host 1 takes about 5 minutes to reboot (Enterprise box with all >> it's >> normal BIOS probing) >> >> - Engine starts after h1 comes back and stabilises >> >> - VM(cinder) unpauses itself, VM(reports) continued fine the whole >> time. >> I can do no diagnosis on the 2 VMs as the engine is not available. >> >> - Local maintenance removed from h1 >> >> >> I don't believe the issue is with gluster itself as the volume >> remains >> accessible on all hosts during this time albeit with a missing server >> (gluster volume status) as each gluster server is rebooted. >> >> Gluster was upgraded as part of the process, no issues were seen >> here. >> >> >> I have been able to duplicate the issue without the upgrade by >> following >> the same sort of timeline. >> >> >> ________________________________ >> From: Sandro Bonazzola <sbonazzo@redhat.com> >> Sent: Monday, 11 April 2016 7:11 PM >> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; >> Sahina >> Bose >> Cc: Bond, Darryl; users >> Subject: Re: [ovirt-users] Hosted engine on gluster problem >> >> >> >> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck >> <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: >> Hi Darryl, >> >> I'm still experimenting with my oVirt installation so I tried to >> recreate the problems you've described. >> >> My setup has three HA hosts for virtualization and three machines >> for the gluster replica 3 setup. >> >> I manually migrated the Engine from the initial install host (one) >> to host three. Then shut down host one manually and interrupted

...
...
...
>> fencing mechanisms so the host stayed down. This didn't bother the >> Engine VM at all. >> >> Did you move the host one to maintenance before shutting down? >> Or is this a crash recovery test? >> >> >> >> To make things a bit more challenging I then shut down host three >> while running the Engine VM. Of course the Engine was down for some >> time until host two detected the problem. It started the Engine VM >> and everything seems to be running quite well without the initial >> install host. >> >> Thanks for the feedback! >> >> >> >> My only problem is that the HA agent on host two and three refuse to >> start after a reboot due to the fact that the configuration of the >> hosted engine is missing. I wrote another mail to >> users@ovirt.org<mailto:users@ovirt.org> >> about that. >> >> This is weird. Martin, Simone can you please investigate on this? >> >> >> >> >> Cheers >> Richard >> >> On 04/08/2016 01:38 AM, Bond, Darryl wrote: >> > There seems to be a pretty severe bug with using hosted engine on >> > gluster. >> > >> > If the host that was used as the initial hosted-engine --deploy >> > host >> > goes away, the engine VM wil crash and cannot be restarted until >> > the host >> > comes back. >> >> is this an Hyperconverged setup? >> >> >> > >> > This is regardless of which host the engine was currently running. >> > >> > >> > The issue seems to be buried in the bowels of VDSM and is not an >> > issue >> > with gluster itself. >> >> Sahina, can you please investigate on this? >> >> >> > >> > The gluster filesystem is still accessable from the host that was >> > running the engine. The issue has been submitted to bugzilla but >> > the fix is >> > some way off (4.1). >> > >> > >> > Can my hosted engine be converted to use NFS (using the gluster NFS >> > server on the same filesystem) without rebuilding my hosted engine >> > (ie >> > change domainType=glusterfs to domainType=nfs)? >> >> > >> > What effect would that have on the hosted-engine storage domain >> > inside >> > oVirt, ie would the same filesystem be mounted twice or would it >> > just break. >> > >> > >> > Will this actually fix the problem, does it have the same issue >> > when >> > the hosted engine is on NFS? >> > >> > >> > Darryl >> > >> > >> > >> > >> > ________________________________ >> > >> > The contents of this electronic message and any attachments are >> > intended only for the addressee and may contain legally

...
...
...
>> > personal, sensitive or confidential information. If you are not

wrote: the the 6 path the the privileged, the

...
...
...
>> > intended >> > addressee, and have received this email, any transmission, >> > distribution, >> > downloading, printing or photocopying of the contents of this >> > message or >> > attachments is strictly prohibited. Any legal privilege or >> > confidentiality >> > attached to this message and attachments is not waived, lost or >> > destroyed by >> > reason of delivery to any person other than intended addressee. If >> > you have >> > received this message and are not the intended addressee you should >> > notify >> > the sender by return email and destroy all copies of the message >> > and any >> > attachments. Unless expressly attributed, the views expressed in >> > this email >> > do not necessarily represent the views of the company. >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org<mailto:Users@ovirt.org> >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> >> >> -- >> /dev/null >> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org<mailto:Users@ovirt.org> >> http://lists.ovirt.org/mailman/listinfo/users >> >> >> >> >> -- >> Sandro Bonazzola >> Better technology. Faster innovation. Powered by community >> collaboration. >> See how it works at redhat.com<http://redhat.com> > > > > > -- > Sandro Bonazzola > Better technology. Faster innovation. Powered by community > collaboration. > See how it works at redhat.com > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community

collaboration. >> See how it works at redhat.com > >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

David Gossage

23 Aug 23 Aug

9:44 p.m.

On Fri, Apr 15, 2016 at 8:00 AM, Luiz Claudio Prazeres Goncalves < luizcpg@gmail.com> wrote:

...

I'm not planning to move to ovirt 4 until it gets stable, so would be great to backport to 3.6 or ,ideally, gets developed on the next release of 3.6 branch. Considering the urgency (its a single point of failure) x complexity wouldn't be hard to make the proposed fix.

Bumping old email sorry. Looks like https://bugzilla.redhat.com/show_bug.cgi?id=1298693 was finished against 3.6.7 according to that RFE. So does that mean if I add appropriate lines to my /etc/ovirt-hosted-engine/hosted-engine.conf the next time I restart engine and agent/brokers to mount that storage point it will utilize the backupvol-server features? If so are appropriate settings outlined in docs somewhere? Running ovirt 3.6.7 and gluster 3.8.2 on centos 7 nodes. I'm using today a production environment on top of gluster replica 3 and

...

this is the only SPF I have.

Thanks Luiz

Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693

When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from the UI you can, like the attached screen shot.

In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693.

for example:

Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine

I agree, this feature must be supported.

It will, and it's currently targeted to 4.0.

...
...
Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com,fetch-attempts=3,log- level=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log

Sandro, it seems to me a simple solution which can be easily fixed.

What do you think?

Regards -Luiz

2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:

...
On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com>

...
...
On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote: > Hi Sandro, I've been using gluster with 3 external hosts for a

while

...
> and > things are working pretty well, however this single point of failure > looks > like a simple feature to implement,but critical to anyone who wants to > use > gluster on production . This is not hyperconvergency which has other > issues/implications. So , why not have this feature out on 3.6 branch? > It > looks like just let vdsm use the 'backupvol-server' option when > mounting the > engine domain and make the property tests.

Can you explain what is the problem, and what is the suggested solution?

Engine and vdsm already support the backupvol-server option - you can define this option in the storage domain options when you create a gluster storage domain. With this option vdsm should be able to connect to gluster storage domain even if a brick is down.

If you don't have this option in engine , you probably cannot add it with hosted engine setup, since for editing it you must put the storage domain in maintenance and if you do this the engine vm will be killed :-) This is is one of the issues with engine managing the storage domain it runs on.

I think the best way to avoid this issue, is to add a DNS entry providing the addresses of all the gluster bricks, and use this address for the gluster storage domain. This way the glusterfs mount helper can mount the domain even if one of the gluster bricks are down.

Again, we will need some magic from the hosted engine developers to modify the address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

...
Nir

> > Could you add this feature to the next release of 3.6 branch? > > Thanks > Luiz > > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <

sbonazzo@redhat.com>

...
> escreveu: >> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl < dbond@nrggos.com.au> >> wrote: >>> >>> My setup is hyperconverged. I have placed my test results in >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298693 >>> >> >> Ok, so you're aware about the limitation of the single point of >> failure. >> If you drop the host referenced in hosted engine configuration for the >> initial setup it won't be able to connect to shared storage even if >> the >> other hosts in the cluster are up since the entry point is down. >> Note that hyperconverged deployment is not supported in 3.6. >> >> >>> >>> >>> Short description of setup: >>> >>> 3 hosts with 2 disks each set up with gluster replica 3 across

...
...
>>> disks volume name hosted-engine. >>> >>> Hostname hosted-storage configured in /etc//hosts to point to the >>> host1. >>> >>> Installed hosted engine on host1 with the hosted engine storage

...
...
>>> = >>> hosted-storage:/hosted-engine >>> >>> Install first engine on h1 successful. Hosts h2 and h3 added to

...
...
>>> hosted engine. All works fine. >>> >>> Additional storage and non-hosted engine hosts added etc. >>> >>> Additional VMs added to hosted-engine storage (oVirt Reports VM and >>> Cinder VM). Additional VM's are hosted by other storage - cinder and >>> NFS. >>> >>> The system is in production. >>> >>> >>> Engine can be migrated around with the web interface. >>> >>> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is >>> upgraded >>> first , new Centos kernel requires host reboot. >>> >>> - Engine placed on h2 - h3 into maintenance (local) upgrade and >>> Reboot >>> h3 - No issues - Local maintenance removed from h3. >>> >>> - Engine placed on h3 - h2 into maintenance (local) upgrade and >>> Reboot >>> h2 - No issues - Local maintenance removed from h2. >>> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot >>> h1 - >>> engine crashes and does not start elsewhere, VM(cinder) on h3 on >>> same >>> gluster volume pauses. >>> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with all >>> it's >>> normal BIOS probing) >>> >>> - Engine starts after h1 comes back and stabilises >>> >>> - VM(cinder) unpauses itself, VM(reports) continued fine the whole >>> time. >>> I can do no diagnosis on the 2 VMs as the engine is not available. >>> >>> - Local maintenance removed from h1 >>> >>> >>> I don't believe the issue is with gluster itself as the volume >>> remains >>> accessible on all hosts during this time albeit with a missing server >>> (gluster volume status) as each gluster server is rebooted. >>> >>> Gluster was upgraded as part of the process, no issues were seen >>> here. >>> >>> >>> I have been able to duplicate the issue without the upgrade by >>> following >>> the same sort of timeline. >>> >>> >>> ________________________________ >>> From: Sandro Bonazzola <sbonazzo@redhat.com> >>> Sent: Monday, 11 April 2016 7:11 PM >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; >>> Sahina >>> Bose >>> Cc: Bond, Darryl; users >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem >>> >>> >>> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck >>> <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: >>> Hi Darryl, >>> >>> I'm still experimenting with my oVirt installation so I tried to >>> recreate the problems you've described. >>> >>> My setup has three HA hosts for virtualization and three machines >>> for the gluster replica 3 setup. >>> >>> I manually migrated the Engine from the initial install host (one) >>> to host three. Then shut down host one manually and interrupted

...
...
>>> fencing mechanisms so the host stayed down. This didn't bother

...
...
>>> Engine VM at all. >>> >>> Did you move the host one to maintenance before shutting down? >>> Or is this a crash recovery test? >>> >>> >>> >>> To make things a bit more challenging I then shut down host three >>> while running the Engine VM. Of course the Engine was down for some >>> time until host two detected the problem. It started the Engine VM >>> and everything seems to be running quite well without the initial >>> install host. >>> >>> Thanks for the feedback! >>> >>> >>> >>> My only problem is that the HA agent on host two and three refuse to >>> start after a reboot due to the fact that the configuration of

...
...
>>> hosted engine is missing. I wrote another mail to >>> users@ovirt.org<mailto:users@ovirt.org> >>> about that. >>> >>> This is weird. Martin, Simone can you please investigate on

...
...
>>> >>> >>> >>> >>> Cheers >>> Richard >>> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote: >>> > There seems to be a pretty severe bug with using hosted engine on >>> > gluster. >>> > >>> > If the host that was used as the initial hosted-engine --deploy >>> > host >>> > goes away, the engine VM wil crash and cannot be restarted until >>> > the host >>> > comes back. >>> >>> is this an Hyperconverged setup? >>> >>> >>> > >>> > This is regardless of which host the engine was currently running. >>> > >>> > >>> > The issue seems to be buried in the bowels of VDSM and is not an >>> > issue >>> > with gluster itself. >>> >>> Sahina, can you please investigate on this? >>> >>> >>> > >>> > The gluster filesystem is still accessable from the host that was >>> > running the engine. The issue has been submitted to bugzilla but >>> > the fix is >>> > some way off (4.1). >>> > >>> > >>> > Can my hosted engine be converted to use NFS (using the gluster NFS >>> > server on the same filesystem) without rebuilding my hosted engine >>> > (ie >>> > change domainType=glusterfs to domainType=nfs)? >>> >>> > >>> > What effect would that have on the hosted-engine storage domain >>> > inside >>> > oVirt, ie would the same filesystem be mounted twice or would it >>> > just break. >>> > >>> > >>> > Will this actually fix the problem, does it have the same issue >>> > when >>> > the hosted engine is on NFS? >>> > >>> > >>> > Darryl >>> > >>> > >>> > >>> > >>> > ________________________________ >>> > >>> > The contents of this electronic message and any attachments are >>> > intended only for the addressee and may contain legally

wrote: the 6 path the the the the this? privileged,

...
...
>>> > personal, sensitive or confidential information. If you are not the >>> > intended >>> > addressee, and have received this email, any transmission, >>> > distribution, >>> > downloading, printing or photocopying of the contents of this >>> > message or >>> > attachments is strictly prohibited. Any legal privilege or >>> > confidentiality >>> > attached to this message and attachments is not waived, lost or >>> > destroyed by >>> > reason of delivery to any person other than intended addressee. If >>> > you have >>> > received this message and are not the intended addressee you should >>> > notify >>> > the sender by return email and destroy all copies of the message >>> > and any >>> > attachments. Unless expressly attributed, the views expressed in >>> > this email >>> > do not necessarily represent the views of the company. >>> > _______________________________________________ >>> > Users mailing list >>> > Users@ovirt.org<mailto:Users@ovirt.org> >>> > http://lists.ovirt.org/mailman/listinfo/users >>> > >>> >>> >>> -- >>> /dev/null >>> >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org<mailto:Users@ovirt.org> >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> >>> -- >>> Sandro Bonazzola >>> Better technology. Faster innovation. Powered by community >>> collaboration. >>> See how it works at redhat.com<http://redhat.com> >> >> >> >> >> -- >> Sandro Bonazzola >> Better technology. Faster innovation. Powered by community >> collaboration. >> See how it works at redhat.com >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community

collaboration. >> See how it works at redhat.com > >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Sandro Bonazzola

26 Aug 26 Aug

9:54 a.m.

On Tue, Aug 23, 2016 at 8:44 PM, David Gossage <dgossage@carouselchecks.com> wrote:

...

On Fri, Apr 15, 2016 at 8:00 AM, Luiz Claudio Prazeres Goncalves < luizcpg@gmail.com> wrote:

...
I'm not planning to move to ovirt 4 until it gets stable, so would be great to backport to 3.6 or ,ideally, gets developed on the next release of 3.6 branch. Considering the urgency (its a single point of failure) x complexity wouldn't be hard to make the proposed fix.

Bumping old email sorry. Looks like https://bugzilla.redhat.com/ show_bug.cgi?id=1298693 was finished against 3.6.7 according to that RFE.

So does that mean if I add appropriate lines to my /etc/ovirt-hosted-engine/hosted-engine.conf the next time I restart engine and agent/brokers to mount that storage point it will utilize the backupvol-server features?

If so are appropriate settings outlined in docs somewhere?

Running ovirt 3.6.7 and gluster 3.8.2 on centos 7 nodes.

Adding Simone

...

I'm using today a production environment on top of gluster replica 3 and

...
this is the only SPF I have.

Thanks Luiz

Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote:

...
Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693

When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from the UI you can, like the attached screen shot.

In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693.

for example:

Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine

I agree, this feature must be supported.

It will, and it's currently targeted to 4.0.

...
...
Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com,fetch-attempts=3,log-l evel=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log

Sandro, it seems to me a simple solution which can be easily fixed.

What do you think?

Regards -Luiz

2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>:

...
On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com>

...
> > On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves > <luizcpg@gmail.com> wrote: > > Hi Sandro, I've been using gluster with 3 external hosts for a while > > and > > things are working pretty well, however this single point of failure > > looks > > like a simple feature to implement,but critical to anyone who wants to > > use > > gluster on production . This is not hyperconvergency which has other > > issues/implications. So , why not have this feature out on 3.6 branch? > > It > > looks like just let vdsm use the 'backupvol-server' option when > > mounting the > > engine domain and make the property tests. > > Can you explain what is the problem, and what is the suggested solution? > > Engine and vdsm already support the backupvol-server option - you can > define this option in the storage domain options when you create a > gluster > storage domain. With this option vdsm should be able to connect to > gluster > storage domain even if a brick is down. > > If you don't have this option in engine , you probably cannot add it with > hosted > engine setup, since for editing it you must put the storage domain in > maintenance > and if you do this the engine vm will be killed :-) This is is one of > the issues with > engine managing the storage domain it runs on. > > I think the best way to avoid this issue, is to add a DNS entry > providing the addresses > of all the gluster bricks, and use this address for the gluster > storage domain. This way > the glusterfs mount helper can mount the domain even if one of the > gluster bricks > are down. > > Again, we will need some magic from the hosted engine developers to > modify the > address of the hosted engine gluster domain on existing system.

Magic won't happen without a bz :-) please open one describing what's requested.

> > > Nir > > > > > Could you add this feature to the next release of 3.6 branch? > > > > Thanks > > Luiz > > > > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola < sbonazzo@redhat.com> > > escreveu: > >> > >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl < dbond@nrggos.com.au> > >> wrote: > >>> > >>> My setup is hyperconverged. I have placed my test results in > >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298693 > >>> > >> > >> Ok, so you're aware about the limitation of the single point of > >> failure. > >> If you drop the host referenced in hosted engine configuration for the > >> initial setup it won't be able to connect to shared storage even if > >> the > >> other hosts in the cluster are up since the entry point is down. > >> Note that hyperconverged deployment is not supported in 3.6. > >> > >> > >>> > >>> > >>> Short description of setup: > >>> > >>> 3 hosts with 2 disks each set up with gluster replica 3 across

...
> >>> disks volume name hosted-engine. > >>> > >>> Hostname hosted-storage configured in /etc//hosts to point to

...
> >>> host1. > >>> > >>> Installed hosted engine on host1 with the hosted engine storage

...
> >>> = > >>> hosted-storage:/hosted-engine > >>> > >>> Install first engine on h1 successful. Hosts h2 and h3 added to

...
> >>> hosted engine. All works fine. > >>> > >>> Additional storage and non-hosted engine hosts added etc. > >>> > >>> Additional VMs added to hosted-engine storage (oVirt Reports VM and > >>> Cinder VM). Additional VM's are hosted by other storage - cinder and > >>> NFS. > >>> > >>> The system is in production. > >>> > >>> > >>> Engine can be migrated around with the web interface. > >>> > >>> > >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is > >>> upgraded > >>> first , new Centos kernel requires host reboot. > >>> > >>> - Engine placed on h2 - h3 into maintenance (local) upgrade and > >>> Reboot > >>> h3 - No issues - Local maintenance removed from h3. > >>> > >>> - Engine placed on h3 - h2 into maintenance (local) upgrade and > >>> Reboot > >>> h2 - No issues - Local maintenance removed from h2. > >>> > >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot > >>> h1 - > >>> engine crashes and does not start elsewhere, VM(cinder) on h3 on > >>> same > >>> gluster volume pauses. > >>> > >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with all > >>> it's > >>> normal BIOS probing) > >>> > >>> - Engine starts after h1 comes back and stabilises > >>> > >>> - VM(cinder) unpauses itself, VM(reports) continued fine the whole > >>> time. > >>> I can do no diagnosis on the 2 VMs as the engine is not available. > >>> > >>> - Local maintenance removed from h1 > >>> > >>> > >>> I don't believe the issue is with gluster itself as the volume > >>> remains > >>> accessible on all hosts during this time albeit with a missing server > >>> (gluster volume status) as each gluster server is rebooted. > >>> > >>> Gluster was upgraded as part of the process, no issues were seen > >>> here. > >>> > >>> > >>> I have been able to duplicate the issue without the upgrade by > >>> following > >>> the same sort of timeline. > >>> > >>> > >>> ________________________________ > >>> From: Sandro Bonazzola <sbonazzo@redhat.com> > >>> Sent: Monday, 11 April 2016 7:11 PM > >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; > >>> Sahina > >>> Bose > >>> Cc: Bond, Darryl; users > >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem > >>> > >>> > >>> > >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck > >>> <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: > >>> Hi Darryl, > >>> > >>> I'm still experimenting with my oVirt installation so I tried to > >>> recreate the problems you've described. > >>> > >>> My setup has three HA hosts for virtualization and three machines > >>> for the gluster replica 3 setup. > >>> > >>> I manually migrated the Engine from the initial install host (one) > >>> to host three. Then shut down host one manually and interrupted

...
> >>> fencing mechanisms so the host stayed down. This didn't bother

...
> >>> Engine VM at all. > >>> > >>> Did you move the host one to maintenance before shutting down? > >>> Or is this a crash recovery test? > >>> > >>> > >>> > >>> To make things a bit more challenging I then shut down host

...
> >>> while running the Engine VM. Of course the Engine was down for some > >>> time until host two detected the problem. It started the Engine VM > >>> and everything seems to be running quite well without the initial > >>> install host. > >>> > >>> Thanks for the feedback! > >>> > >>> > >>> > >>> My only problem is that the HA agent on host two and three refuse to > >>> start after a reboot due to the fact that the configuration of

...
> >>> hosted engine is missing. I wrote another mail to > >>> users@ovirt.org<mailto:users@ovirt.org> > >>> about that. > >>> > >>> This is weird. Martin, Simone can you please investigate on

...
> >>> > >>> > >>> > >>> > >>> Cheers > >>> Richard > >>> > >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote: > >>> > There seems to be a pretty severe bug with using hosted engine on > >>> > gluster. > >>> > > >>> > If the host that was used as the initial hosted-engine --deploy > >>> > host > >>> > goes away, the engine VM wil crash and cannot be restarted until > >>> > the host > >>> > comes back. > >>> > >>> is this an Hyperconverged setup? > >>> > >>> > >>> > > >>> > This is regardless of which host the engine was currently running. > >>> > > >>> > > >>> > The issue seems to be buried in the bowels of VDSM and is not an > >>> > issue > >>> > with gluster itself. > >>> > >>> Sahina, can you please investigate on this? > >>> > >>> > >>> > > >>> > The gluster filesystem is still accessable from the host that was > >>> > running the engine. The issue has been submitted to bugzilla but > >>> > the fix is > >>> > some way off (4.1). > >>> > > >>> > > >>> > Can my hosted engine be converted to use NFS (using the gluster NFS > >>> > server on the same filesystem) without rebuilding my hosted engine > >>> > (ie > >>> > change domainType=glusterfs to domainType=nfs)? > >>> > >>> > > >>> > What effect would that have on the hosted-engine storage domain > >>> > inside > >>> > oVirt, ie would the same filesystem be mounted twice or would it > >>> > just break. > >>> > > >>> > > >>> > Will this actually fix the problem, does it have the same issue > >>> > when > >>> > the hosted engine is on NFS? > >>> > > >>> > > >>> > Darryl > >>> > > >>> > > >>> > > >>> > > >>> > ________________________________ > >>> > > >>> > The contents of this electronic message and any attachments are > >>> > intended only for the addressee and may contain legally

wrote: the 6 the path the the the three the this? privileged,

...
> >>> > personal, sensitive or confidential information. If you are not the > >>> > intended > >>> > addressee, and have received this email, any transmission, > >>> > distribution, > >>> > downloading, printing or photocopying of the contents of this > >>> > message or > >>> > attachments is strictly prohibited. Any legal privilege or > >>> > confidentiality > >>> > attached to this message and attachments is not waived, lost or > >>> > destroyed by > >>> > reason of delivery to any person other than intended addressee. If > >>> > you have > >>> > received this message and are not the intended addressee you should > >>> > notify > >>> > the sender by return email and destroy all copies of the message > >>> > and any > >>> > attachments. Unless expressly attributed, the views expressed in > >>> > this email > >>> > do not necessarily represent the views of the company. > >>> > _______________________________________________ > >>> > Users mailing list > >>> > Users@ovirt.org<mailto:Users@ovirt.org> > >>> > http://lists.ovirt.org/mailman/listinfo/users > >>> > > >>> > >>> > >>> -- > >>> /dev/null > >>> > >>> > >>> _______________________________________________ > >>> Users mailing list > >>> Users@ovirt.org<mailto:Users@ovirt.org> > >>> http://lists.ovirt.org/mailman/listinfo/users > >>> > >>> > >>> > >>> > >>> -- > >>> Sandro Bonazzola > >>> Better technology. Faster innovation. Powered by community > >>> collaboration. > >>> See how it works at redhat.com<http://redhat.com> > >> > >> > >> > >> > >> -- > >> Sandro Bonazzola > >> Better technology. Faster innovation. Powered by community > >> collaboration. > >> See how it works at redhat.com > >> _______________________________________________ > >> Users mailing list > >> Users@ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/users > > > > > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community

collaboration. >> See how it works at redhat.com > >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Simone Tiraboschi

29 Aug 29 Aug

6:47 p.m.

On Fri, Aug 26, 2016 at 8:54 AM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:

...

On Tue, Aug 23, 2016 at 8:44 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Fri, Apr 15, 2016 at 8:00 AM, Luiz Claudio Prazeres Goncalves < luizcpg@gmail.com> wrote:

...
I'm not planning to move to ovirt 4 until it gets stable, so would be great to backport to 3.6 or ,ideally, gets developed on the next release of 3.6 branch. Considering the urgency (its a single point of failure) x complexity wouldn't be hard to make the proposed fix.

Bumping old email sorry. Looks like https://bugzilla.redhat.com/sh ow_bug.cgi?id=1298693 was finished against 3.6.7 according to that RFE.

So does that mean if I add appropriate lines to my /etc/ovirt-hosted-engine/hosted-engine.conf the next time I restart engine and agent/brokers to mount that storage point it will utilize the backupvol-server features?

If so are appropriate settings outlined in docs somewhere?

Running ovirt 3.6.7 and gluster 3.8.2 on centos 7 nodes.

Adding Simone

First step, you have to edit /etc/ovirt-hosted-engine/hosted-engine.conf on all your hosted-engine hosts to ensure that the storage field always point to the same entry point (host01 for instance) Then on each host you can add something like: mnt_options=backupvolfile-server=host02.yourdomain.com:host03. <http://ost03.ovirt.forest.go.th/>yourdomain.com,fetch-attempts=2,log-level= WARNING,log-file=/var/log/engine_domain.log Then check the representation of your storage connection in the table storage_server_connections of the engine DB and make sure that connection refers to the entry point you used in hosted-engine.conf on all your hosts, you have lastly to set the value of mount_options also here. Please tune also the value of network.ping-timeout for your glusterFS volume to avoid this: https://bugzilla.redhat.com/show_bug.cgi?id=1319657#c17 You can find other information here: https://www.ovirt.org/develop/release-management/features/engine/self-hosted...

...

...
I'm using today a production environment on top of gluster replica 3 and

...
this is the only SPF I have.

Thanks Luiz

Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
...
Nir, here is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=1298693

When you do a hosted-engine --deploy and pick "glusterfs" you don't have a way to define the mount options, therefore, the use of the "backupvol-server", however when you create a storage domain from

On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote: the UI you

...
can, like the attached screen shot.

In the hosted-engine --deploy, I would expect a flow which includes not only the "gluster" entrypoint, but also the gluster mount options which is missing today. This option would be optional, but would remove the single point of failure described on the Bug 1298693.

for example:

Existing entry point on the "hosted-engine --deploy" flow gluster1.xyz.com:/engine

I agree, this feature must be supported.

It will, and it's currently targeted to 4.0.

...
...
Missing option on the "hosted-engine --deploy" flow : backupvolfile-server=gluster2.xyz.com,fetch-attempts=3,log-l evel=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log

Sandro, it seems to me a simple solution which can be easily fixed.

What do you think?

Regards -Luiz

2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>: > > > > On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com> wrote: >> >> On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves >> <luizcpg@gmail.com> wrote: >> > Hi Sandro, I've been using gluster with 3 external hosts for a while >> > and >> > things are working pretty well, however this single point of failure >> > looks >> > like a simple feature to implement,but critical to anyone who wants to >> > use >> > gluster on production . This is not hyperconvergency which has other >> > issues/implications. So , why not have this feature out on 3.6 branch? >> > It >> > looks like just let vdsm use the 'backupvol-server' option when >> > mounting the >> > engine domain and make the property tests. >> >> Can you explain what is the problem, and what is the suggested solution? >> >> Engine and vdsm already support the backupvol-server option - you can >> define this option in the storage domain options when you create a >> gluster >> storage domain. With this option vdsm should be able to connect to >> gluster >> storage domain even if a brick is down. >> >> If you don't have this option in engine , you probably cannot add it with >> hosted >> engine setup, since for editing it you must put the storage domain in >> maintenance >> and if you do this the engine vm will be killed :-) This is is one of >> the issues with >> engine managing the storage domain it runs on. >> >> I think the best way to avoid this issue, is to add a DNS entry >> providing the addresses >> of all the gluster bricks, and use this address for the gluster >> storage domain. This way >> the glusterfs mount helper can mount the domain even if one of the >> gluster bricks >> are down. >> >> Again, we will need some magic from the hosted engine developers to >> modify the >> address of the hosted engine gluster domain on existing system. > > > Magic won't happen without a bz :-) please open one describing what's > requested. > > >> >> >> Nir >> >> > >> > Could you add this feature to the next release of 3.6 branch? >> > >> > Thanks >> > Luiz >> > >> > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola < sbonazzo@redhat.com> >> > escreveu: >> >> >> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl < dbond@nrggos.com.au> >> >> wrote: >> >>> >> >>> My setup is hyperconverged. I have placed my test results in >> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298693 >> >>> >> >> >> >> Ok, so you're aware about the limitation of the single point of >> >> failure. >> >> If you drop the host referenced in hosted engine configuration for the >> >> initial setup it won't be able to connect to shared storage even if >> >> the >> >> other hosts in the cluster are up since the entry point is down. >> >> Note that hyperconverged deployment is not supported in 3.6. >> >> >> >> >> >>> >> >>> >> >>> Short description of setup: >> >>> >> >>> 3 hosts with 2 disks each set up with gluster replica 3 across the 6 >> >>> disks volume name hosted-engine. >> >>> >> >>> Hostname hosted-storage configured in /etc//hosts to point to the >> >>> host1. >> >>> >> >>> Installed hosted engine on host1 with the hosted engine storage path >> >>> = >> >>> hosted-storage:/hosted-engine >> >>> >> >>> Install first engine on h1 successful. Hosts h2 and h3 added to the >> >>> hosted engine. All works fine. >> >>> >> >>> Additional storage and non-hosted engine hosts added etc. >> >>> >> >>> Additional VMs added to hosted-engine storage (oVirt Reports VM and >> >>> Cinder VM). Additional VM's are hosted by other storage - cinder and >> >>> NFS. >> >>> >> >>> The system is in production. >> >>> >> >>> >> >>> Engine can be migrated around with the web interface. >> >>> >> >>> >> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is >> >>> upgraded >> >>> first , new Centos kernel requires host reboot. >> >>> >> >>> - Engine placed on h2 - h3 into maintenance (local) upgrade and >> >>> Reboot >> >>> h3 - No issues - Local maintenance removed from h3. >> >>> >> >>> - Engine placed on h3 - h2 into maintenance (local) upgrade and >> >>> Reboot >> >>> h2 - No issues - Local maintenance removed from h2. >> >>> >> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot >> >>> h1 - >> >>> engine crashes and does not start elsewhere, VM(cinder) on h3 on >> >>> same >> >>> gluster volume pauses. >> >>> >> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with all >> >>> it's >> >>> normal BIOS probing) >> >>> >> >>> - Engine starts after h1 comes back and stabilises >> >>> >> >>> - VM(cinder) unpauses itself, VM(reports) continued fine the whole >> >>> time. >> >>> I can do no diagnosis on the 2 VMs as the engine is not available. >> >>> >> >>> - Local maintenance removed from h1 >> >>> >> >>> >> >>> I don't believe the issue is with gluster itself as the volume >> >>> remains >> >>> accessible on all hosts during this time albeit with a missing server >> >>> (gluster volume status) as each gluster server is rebooted. >> >>> >> >>> Gluster was upgraded as part of the process, no issues were seen >> >>> here. >> >>> >> >>> >> >>> I have been able to duplicate the issue without the upgrade by >> >>> following >> >>> the same sort of timeline. >> >>> >> >>> >> >>> ________________________________ >> >>> From: Sandro Bonazzola <sbonazzo@redhat.com> >> >>> Sent: Monday, 11 April 2016 7:11 PM >> >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; >> >>> Sahina >> >>> Bose >> >>> Cc: Bond, Darryl; users >> >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem >> >>> >> >>> >> >>> >> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck >> >>> <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: >> >>> Hi Darryl, >> >>> >> >>> I'm still experimenting with my oVirt installation so I tried to >> >>> recreate the problems you've described. >> >>> >> >>> My setup has three HA hosts for virtualization and three machines >> >>> for the gluster replica 3 setup. >> >>> >> >>> I manually migrated the Engine from the initial install host (one) >> >>> to host three. Then shut down host one manually and interrupted the >> >>> fencing mechanisms so the host stayed down. This didn't bother the >> >>> Engine VM at all. >> >>> >> >>> Did you move the host one to maintenance before shutting down? >> >>> Or is this a crash recovery test? >> >>> >> >>> >> >>> >> >>> To make things a bit more challenging I then shut down host three >> >>> while running the Engine VM. Of course the Engine was down for some >> >>> time until host two detected the problem. It started the Engine VM >> >>> and everything seems to be running quite well without the initial >> >>> install host. >> >>> >> >>> Thanks for the feedback! >> >>> >> >>> >> >>> >> >>> My only problem is that the HA agent on host two and three refuse to >> >>> start after a reboot due to the fact that the configuration of the >> >>> hosted engine is missing. I wrote another mail to >> >>> users@ovirt.org<mailto:users@ovirt.org> >> >>> about that. >> >>> >> >>> This is weird. Martin, Simone can you please investigate on this? >> >>> >> >>> >> >>> >> >>> >> >>> Cheers >> >>> Richard >> >>> >> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote: >> >>> > There seems to be a pretty severe bug with using hosted engine on >> >>> > gluster. >> >>> > >> >>> > If the host that was used as the initial hosted-engine --deploy >> >>> > host >> >>> > goes away, the engine VM wil crash and cannot be restarted until >> >>> > the host >> >>> > comes back. >> >>> >> >>> is this an Hyperconverged setup? >> >>> >> >>> >> >>> > >> >>> > This is regardless of which host the engine was currently running. >> >>> > >> >>> > >> >>> > The issue seems to be buried in the bowels of VDSM and is not an >> >>> > issue >> >>> > with gluster itself. >> >>> >> >>> Sahina, can you please investigate on this? >> >>> >> >>> >> >>> > >> >>> > The gluster filesystem is still accessable from the host that was >> >>> > running the engine. The issue has been submitted to bugzilla but >> >>> > the fix is >> >>> > some way off (4.1). >> >>> > >> >>> > >> >>> > Can my hosted engine be converted to use NFS (using the gluster NFS >> >>> > server on the same filesystem) without rebuilding my hosted engine >> >>> > (ie >> >>> > change domainType=glusterfs to domainType=nfs)? >> >>> >> >>> > >> >>> > What effect would that have on the hosted-engine storage domain >> >>> > inside >> >>> > oVirt, ie would the same filesystem be mounted twice or would it >> >>> > just break. >> >>> > >> >>> > >> >>> > Will this actually fix the problem, does it have the same issue >> >>> > when >> >>> > the hosted engine is on NFS? >> >>> > >> >>> > >> >>> > Darryl >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > ________________________________ >> >>> > >> >>> > The contents of this electronic message and any attachments are >> >>> > intended only for the addressee and may contain legally privileged, >> >>> > personal, sensitive or confidential information. If you are not the >> >>> > intended >> >>> > addressee, and have received this email, any transmission, >> >>> > distribution, >> >>> > downloading, printing or photocopying of the contents of this >> >>> > message or >> >>> > attachments is strictly prohibited. Any legal privilege or >> >>> > confidentiality >> >>> > attached to this message and attachments is not waived, lost or >> >>> > destroyed by >> >>> > reason of delivery to any person other than intended addressee. If >> >>> > you have >> >>> > received this message and are not the intended addressee you should >> >>> > notify >> >>> > the sender by return email and destroy all copies of the message >> >>> > and any >> >>> > attachments. Unless expressly attributed, the views expressed in >> >>> > this email >> >>> > do not necessarily represent the views of the company. >> >>> > _______________________________________________ >> >>> > Users mailing list >> >>> > Users@ovirt.org<mailto:Users@ovirt.org> >> >>> > http://lists.ovirt.org/mailman/listinfo/users >> >>> > >> >>> >> >>> >> >>> -- >> >>> /dev/null >> >>> >> >>> >> >>> _______________________________________________ >> >>> Users mailing list >> >>> Users@ovirt.org<mailto:Users@ovirt.org> >> >>> http://lists.ovirt.org/mailman/listinfo/users >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Sandro Bonazzola >> >>> Better technology. Faster innovation. Powered by community >> >>> collaboration. >> >>> See how it works at redhat.com<http://redhat.com> >> >> >> >> >> >> >> >> >> >> -- >> >> Sandro Bonazzola >> >> Better technology. Faster innovation. Powered by community >> >> collaboration. >> >> See how it works at redhat.com >> >> _______________________________________________ >> >> Users mailing list >> >> Users@ovirt.org >> >> http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> > > > > > > -- > Sandro Bonazzola > Better technology. Faster innovation. Powered by community

collaboration. >> See how it works at redhat.com > >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

David Gossage

7:11 p.m.

On Mon, Aug 29, 2016 at 10:47 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

On Fri, Aug 26, 2016 at 8:54 AM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:

...
On Tue, Aug 23, 2016 at 8:44 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Fri, Apr 15, 2016 at 8:00 AM, Luiz Claudio Prazeres Goncalves < luizcpg@gmail.com> wrote:

...
I'm not planning to move to ovirt 4 until it gets stable, so would be great to backport to 3.6 or ,ideally, gets developed on the next release of 3.6 branch. Considering the urgency (its a single point of failure) x complexity wouldn't be hard to make the proposed fix.

Bumping old email sorry. Looks like https://bugzilla.redhat.com/sh ow_bug.cgi?id=1298693 was finished against 3.6.7 according to that RFE.

So does that mean if I add appropriate lines to my /etc/ovirt-hosted-engine/hosted-engine.conf the next time I restart engine and agent/brokers to mount that storage point it will utilize the backupvol-server features?

If so are appropriate settings outlined in docs somewhere?

Running ovirt 3.6.7 and gluster 3.8.2 on centos 7 nodes.

Adding Simone

First step, you have to edit /etc/ovirt-hosted-engine/hosted-engine.conf on all your hosted-engine hosts to ensure that the storage field always point to the same entry point (host01 for instance) Then on each host you can add something like: mnt_options=backupvolfile-server=host02.yourdomain.com:host03. <http://ost03.ovirt.forest.go.th/>yourdomain.com,fetch- attempts=2,log-level=WARNING,log-file=/var/log/engine_domain.log

Then check the representation of your storage connection in the table storage_server_connections of the engine DB and make sure that connection refers to the entry point you used in hosted-engine.conf on all your hosts, you have lastly to set the value of mount_options also here.

Please tune also the value of network.ping-timeout for your glusterFS volume to avoid this: https://bugzilla.redhat.com/show_bug.cgi?id=1319657#c17

You can find other information here: https://www.ovirt.org/develop/release-management/features/ engine/self-hosted-engine-gluster-support/

Thanks, I'll review all that information.

...

...
...
I'm using today a production environment on top of gluster replica 3 and

...
this is the only SPF I have.

Thanks Luiz

Em sex, 15 de abr de 2016 03:05, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Thu, Apr 14, 2016 at 7:35 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Apr 13, 2016 at 4:34 PM, Luiz Claudio Prazeres Goncalves <luizcpg@gmail.com> wrote: > Nir, here is the problem: > https://bugzilla.redhat.com/show_bug.cgi?id=1298693 > > When you do a hosted-engine --deploy and pick "glusterfs" you don't have a > way to define the mount options, therefore, the use of the > "backupvol-server", however when you create a storage domain from the UI you > can, like the attached screen shot. > > > In the hosted-engine --deploy, I would expect a flow which includes not only > the "gluster" entrypoint, but also the gluster mount options which is > missing today. This option would be optional, but would remove the single > point of failure described on the Bug 1298693. > > for example: > > Existing entry point on the "hosted-engine --deploy" flow > gluster1.xyz.com:/engine

I agree, this feature must be supported.

It will, and it's currently targeted to 4.0.

...
> Missing option on the "hosted-engine --deploy" flow : > backupvolfile-server=gluster2.xyz.com,fetch-attempts=3,log-l evel=WARNING,log-file=/var/log/glusterfs/gluster_engine_domain.log > > Sandro, it seems to me a simple solution which can be easily fixed. > > What do you think? > > Regards > -Luiz > > > > 2016-04-13 4:15 GMT-03:00 Sandro Bonazzola <sbonazzo@redhat.com>: >> >> >> >> On Tue, Apr 12, 2016 at 6:47 PM, Nir Soffer <nsoffer@redhat.com> wrote: >>> >>> On Tue, Apr 12, 2016 at 3:05 PM, Luiz Claudio Prazeres Goncalves >>> <luizcpg@gmail.com> wrote: >>> > Hi Sandro, I've been using gluster with 3 external hosts for a while >>> > and >>> > things are working pretty well, however this single point of failure >>> > looks >>> > like a simple feature to implement,but critical to anyone who wants to >>> > use >>> > gluster on production . This is not hyperconvergency which has other >>> > issues/implications. So , why not have this feature out on 3.6 branch? >>> > It >>> > looks like just let vdsm use the 'backupvol-server' option when >>> > mounting the >>> > engine domain and make the property tests. >>> >>> Can you explain what is the problem, and what is the suggested solution? >>> >>> Engine and vdsm already support the backupvol-server option - you can >>> define this option in the storage domain options when you create a >>> gluster >>> storage domain. With this option vdsm should be able to connect to >>> gluster >>> storage domain even if a brick is down. >>> >>> If you don't have this option in engine , you probably cannot add it with >>> hosted >>> engine setup, since for editing it you must put the storage domain in >>> maintenance >>> and if you do this the engine vm will be killed :-) This is is one of >>> the issues with >>> engine managing the storage domain it runs on. >>> >>> I think the best way to avoid this issue, is to add a DNS entry >>> providing the addresses >>> of all the gluster bricks, and use this address for the gluster >>> storage domain. This way >>> the glusterfs mount helper can mount the domain even if one of the >>> gluster bricks >>> are down. >>> >>> Again, we will need some magic from the hosted engine developers to >>> modify the >>> address of the hosted engine gluster domain on existing system. >> >> >> Magic won't happen without a bz :-) please open one describing what's >> requested. >> >> >>> >>> >>> Nir >>> >>> > >>> > Could you add this feature to the next release of 3.6 branch? >>> > >>> > Thanks >>> > Luiz >>> > >>> > Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola < sbonazzo@redhat.com> >>> > escreveu: >>> >> >>> >> On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl < dbond@nrggos.com.au> >>> >> wrote: >>> >>> >>> >>> My setup is hyperconverged. I have placed my test results in >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1298693 >>> >>> >>> >> >>> >> Ok, so you're aware about the limitation of the single point of >>> >> failure. >>> >> If you drop the host referenced in hosted engine configuration for the >>> >> initial setup it won't be able to connect to shared storage even if >>> >> the >>> >> other hosts in the cluster are up since the entry point is down. >>> >> Note that hyperconverged deployment is not supported in 3.6. >>> >> >>> >> >>> >>> >>> >>> >>> >>> Short description of setup: >>> >>> >>> >>> 3 hosts with 2 disks each set up with gluster replica 3 across the 6 >>> >>> disks volume name hosted-engine. >>> >>> >>> >>> Hostname hosted-storage configured in /etc//hosts to point to the >>> >>> host1. >>> >>> >>> >>> Installed hosted engine on host1 with the hosted engine storage path >>> >>> = >>> >>> hosted-storage:/hosted-engine >>> >>> >>> >>> Install first engine on h1 successful. Hosts h2 and h3 added to the >>> >>> hosted engine. All works fine. >>> >>> >>> >>> Additional storage and non-hosted engine hosts added etc. >>> >>> >>> >>> Additional VMs added to hosted-engine storage (oVirt Reports VM and >>> >>> Cinder VM). Additional VM's are hosted by other storage - cinder and >>> >>> NFS. >>> >>> >>> >>> The system is in production. >>> >>> >>> >>> >>> >>> Engine can be migrated around with the web interface. >>> >>> >>> >>> >>> >>> - 3.6.4 upgrade released, follow the upgrade guide, engine is >>> >>> upgraded >>> >>> first , new Centos kernel requires host reboot. >>> >>> >>> >>> - Engine placed on h2 - h3 into maintenance (local) upgrade and >>> >>> Reboot >>> >>> h3 - No issues - Local maintenance removed from h3. >>> >>> >>> >>> - Engine placed on h3 - h2 into maintenance (local) upgrade and >>> >>> Reboot >>> >>> h2 - No issues - Local maintenance removed from h2. >>> >>> >>> >>> - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot >>> >>> h1 - >>> >>> engine crashes and does not start elsewhere, VM(cinder) on h3 on >>> >>> same >>> >>> gluster volume pauses. >>> >>> >>> >>> - Host 1 takes about 5 minutes to reboot (Enterprise box with all >>> >>> it's >>> >>> normal BIOS probing) >>> >>> >>> >>> - Engine starts after h1 comes back and stabilises >>> >>> >>> >>> - VM(cinder) unpauses itself, VM(reports) continued fine the whole >>> >>> time. >>> >>> I can do no diagnosis on the 2 VMs as the engine is not available. >>> >>> >>> >>> - Local maintenance removed from h1 >>> >>> >>> >>> >>> >>> I don't believe the issue is with gluster itself as the volume >>> >>> remains >>> >>> accessible on all hosts during this time albeit with a missing server >>> >>> (gluster volume status) as each gluster server is rebooted. >>> >>> >>> >>> Gluster was upgraded as part of the process, no issues were seen >>> >>> here. >>> >>> >>> >>> >>> >>> I have been able to duplicate the issue without the upgrade by >>> >>> following >>> >>> the same sort of timeline. >>> >>> >>> >>> >>> >>> ________________________________ >>> >>> From: Sandro Bonazzola <sbonazzo@redhat.com> >>> >>> Sent: Monday, 11 April 2016 7:11 PM >>> >>> To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; >>> >>> Sahina >>> >>> Bose >>> >>> Cc: Bond, Darryl; users >>> >>> Subject: Re: [ovirt-users] Hosted engine on gluster problem >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck >>> >>> <hawk@tbi.univie.ac.at<mailto:hawk@tbi.univie.ac.at>> wrote: >>> >>> Hi Darryl, >>> >>> >>> >>> I'm still experimenting with my oVirt installation so I tried to >>> >>> recreate the problems you've described. >>> >>> >>> >>> My setup has three HA hosts for virtualization and three machines >>> >>> for the gluster replica 3 setup. >>> >>> >>> >>> I manually migrated the Engine from the initial install host (one) >>> >>> to host three. Then shut down host one manually and interrupted the >>> >>> fencing mechanisms so the host stayed down. This didn't bother the >>> >>> Engine VM at all. >>> >>> >>> >>> Did you move the host one to maintenance before shutting down? >>> >>> Or is this a crash recovery test? >>> >>> >>> >>> >>> >>> >>> >>> To make things a bit more challenging I then shut down host three >>> >>> while running the Engine VM. Of course the Engine was down for some >>> >>> time until host two detected the problem. It started the Engine VM >>> >>> and everything seems to be running quite well without the initial >>> >>> install host. >>> >>> >>> >>> Thanks for the feedback! >>> >>> >>> >>> >>> >>> >>> >>> My only problem is that the HA agent on host two and three refuse to >>> >>> start after a reboot due to the fact that the configuration of the >>> >>> hosted engine is missing. I wrote another mail to >>> >>> users@ovirt.org<mailto:users@ovirt.org> >>> >>> about that. >>> >>> >>> >>> This is weird. Martin, Simone can you please investigate on this? >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Cheers >>> >>> Richard >>> >>> >>> >>> On 04/08/2016 01:38 AM, Bond, Darryl wrote: >>> >>> > There seems to be a pretty severe bug with using hosted engine on >>> >>> > gluster. >>> >>> > >>> >>> > If the host that was used as the initial hosted-engine --deploy >>> >>> > host >>> >>> > goes away, the engine VM wil crash and cannot be restarted until >>> >>> > the host >>> >>> > comes back. >>> >>> >>> >>> is this an Hyperconverged setup? >>> >>> >>> >>> >>> >>> > >>> >>> > This is regardless of which host the engine was currently running. >>> >>> > >>> >>> > >>> >>> > The issue seems to be buried in the bowels of VDSM and is not an >>> >>> > issue >>> >>> > with gluster itself. >>> >>> >>> >>> Sahina, can you please investigate on this? >>> >>> >>> >>> >>> >>> > >>> >>> > The gluster filesystem is still accessable from the host that was >>> >>> > running the engine. The issue has been submitted to bugzilla but >>> >>> > the fix is >>> >>> > some way off (4.1). >>> >>> > >>> >>> > >>> >>> > Can my hosted engine be converted to use NFS (using the gluster NFS >>> >>> > server on the same filesystem) without rebuilding my hosted engine >>> >>> > (ie >>> >>> > change domainType=glusterfs to domainType=nfs)? >>> >>> >>> >>> > >>> >>> > What effect would that have on the hosted-engine storage domain >>> >>> > inside >>> >>> > oVirt, ie would the same filesystem be mounted twice or would it >>> >>> > just break. >>> >>> > >>> >>> > >>> >>> > Will this actually fix the problem, does it have the same issue >>> >>> > when >>> >>> > the hosted engine is on NFS? >>> >>> > >>> >>> > >>> >>> > Darryl >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > ________________________________ >>> >>> > >>> >>> > The contents of this electronic message and any attachments are >>> >>> > intended only for the addressee and may contain legally privileged, >>> >>> > personal, sensitive or confidential information. If you are not the >>> >>> > intended >>> >>> > addressee, and have received this email, any transmission, >>> >>> > distribution, >>> >>> > downloading, printing or photocopying of the contents of this >>> >>> > message or >>> >>> > attachments is strictly prohibited. Any legal privilege or >>> >>> > confidentiality >>> >>> > attached to this message and attachments is not waived, lost or >>> >>> > destroyed by >>> >>> > reason of delivery to any person other than intended addressee. If >>> >>> > you have >>> >>> > received this message and are not the intended addressee you should >>> >>> > notify >>> >>> > the sender by return email and destroy all copies of the message >>> >>> > and any >>> >>> > attachments. Unless expressly attributed, the views expressed in >>> >>> > this email >>> >>> > do not necessarily represent the views of the company. >>> >>> > _______________________________________________ >>> >>> > Users mailing list >>> >>> > Users@ovirt.org<mailto:Users@ovirt.org> >>> >>> > http://lists.ovirt.org/mailman/listinfo/users >>> >>> > >>> >>> >>> >>> >>> >>> -- >>> >>> /dev/null >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> Users mailing list >>> >>> Users@ovirt.org<mailto:Users@ovirt.org> >>> >>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Sandro Bonazzola >>> >>> Better technology. Faster innovation. Powered by community >>> >>> collaboration. >>> >>> See how it works at redhat.com<http://redhat.com> >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Sandro Bonazzola >>> >> Better technology. Faster innovation. Powered by community >>> >> collaboration. >>> >> See how it works at redhat.com >>> >> _______________________________________________ >>> >> Users mailing list >>> >> Users@ovirt.org >>> >> http://lists.ovirt.org/mailman/listinfo/users >>> > >>> > >>> > _______________________________________________ >>> > Users mailing list >>> > Users@ovirt.org >>> > http://lists.ovirt.org/mailman/listinfo/users >>> > >> >> >> >> >> -- >> Sandro Bonazzola >> Better technology. Faster innovation. Powered by community collaboration. >> See how it works at redhat.com > >

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Sandro Bonazzola

13 Apr 13 Apr

10:14 a.m.

On Tue, Apr 12, 2016 at 2:05 PM, Luiz Claudio Prazeres Goncalves < luizcpg@gmail.com> wrote:

...

Hi Sandro, I've been using gluster with 3 external hosts for a while and things are working pretty well, however this single point of failure looks like a simple feature to implement,but critical to anyone who wants to use gluster on production . This is not hyperconvergency which has other issues/implications. So , why not have this feature out on 3.6 branch? It looks like just let vdsm use the 'backupvol-server' option when mounting the engine domain and make the property tests.

Could you add this feature to the next release of 3.6 branch?

Having it in 3.6 is hard in terms of integration team capacity. Also consider than 4.0 will be probably out before we'll be able to backport it to 3.6.

...

Thanks Luiz

Em ter, 12 de abr de 2016 05:03, Sandro Bonazzola <sbonazzo@redhat.com> escreveu:

...
On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au> wrote:

...
My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

...
Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the host1.

Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS.

The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at <mailto:hawk@tbi.univie.ac.at>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org <mailto:users@ovirt.org> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote:

...
There seems to be a pretty severe bug with using hosted engine on gluster.

If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back.

is this an Hyperconverged setup?

...
This is regardless of which host the engine was currently running.

The issue seems to be buried in the bowels of VDSM and is not an issue

with gluster itself.

Sahina, can you please investigate on this?

...
The gluster filesystem is still accessable from the host that was

running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1).

...
Can my hosted engine be converted to use NFS (using the gluster NFS

server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

...
What effect would that have on the hosted-engine storage domain inside

oVirt, ie would the same filesystem be mounted twice or would it just break.

...
Will this actually fix the problem, does it have the same issue when

the hosted engine is on NFS?

...
Darryl

________________________________

The contents of this electronic message and any attachments are

intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company.

...
_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com<http://redhat.com>

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Sahina Bose

5:21 p.m.

This is a multi-part message in MIME format. --------------070809080602030103050300 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 04/12/2016 01:33 PM, Sandro Bonazzola wrote:

...

On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <dbond@nrggos.com.au <mailto:dbond@nrggos.com.au>> wrote:

My setup is hyperconverged. I have placed my test results in https://bugzilla.redhat.com/show_bug.cgi?id=1298693

Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down. Note that hyperconverged deployment is not supported in 3.6.

This issue does not seem related to the single point of failure. Tested this on a 3 node setup with each node mounting the volume hosting HE as localhost:/engine. Since all nodes have glusterd running and belong to same cluster, with any one node down - mount should continue to work. But HE VM is restarted once a node is powered off. broker.log : Thread-4602::ERROR::2016-04-13 18:50:28,249::listener::192::ovirt_hosted_engine_ha.broker.list ener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemB ackend dom_type=glusterfs sd_uuid=7fe3707b-2435-4e71-b831-4daba08cc72c' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", lin e 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 456, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain 7fe3707b-2435-4e71-b831-4daba08cc72c not found in /rhev/data-center/mnt/glusterSD agent.log MainThread::INFO::2016-04-13 18:50:26,020::storage_server::207::ovirt_hosted_engine_ha.lib.sto rage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::INFO::2016-04-13 18:50:28,054::hosted_engine::807::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor for 7fe3707b-2435-4e71-b831-4daba08cc72c MainThread::INFO::2016-04-13 18:50:28,055::image::184::ovirt_hosted_engine_ha.lib.image.Image::(teardown_images) Teardown images MainThread::WARNING::2016-04-13 18:50:28,177::hosted_engine::675::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Disconnecting the storage MainThread::INFO::2016-04-13 18:50:28,177::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server) Disconnecting storage server The gluster mount logs for this time frame contain unmount messages [2016-04-13 13:20:28.199429] I [fuse-bridge.c:4997:fuse_thread_proc] 0-fuse: unmounting /rhev/ data-center/mnt/glusterSD/localhost:_engine [2016-04-13 13:20:28.199934] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0 (+0x7dc5) [0x7ff9b3ceddc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff9b53588b5] - ->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7ff9b5358739] ) 0-: received signum (15), shut ting down [2016-04-13 13:20:28.199970] I [fuse-bridge.c:5704:fini] 0-fuse: Unmounting '/rhev/data-center /mnt/glusterSD/localhost:_engine'.

...

Short description of setup:

3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine.

Hostname hosted-storage configured in /etc//hosts to point to the host1.

Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine

Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine.

Additional storage and non-hosted engine hosts added etc.

Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS.

The system is in production.

Engine can be migrated around with the web interface.

- 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot.

- Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3.

- Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2.

- Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses.

- Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing)

- Engine starts after h1 comes back and stabilises

- VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available.

- Local maintenance removed from h1

I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted.

Gluster was upgraded as part of the process, no issues were seen here.

I have been able to duplicate the issue without the upgrade by following the same sort of timeline.

________________________________ From: Sandro Bonazzola <sbonazzo@redhat.com <mailto:sbonazzo@redhat.com>> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem

On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <hawk@tbi.univie.ac.at <mailto:hawk@tbi.univie.ac.at><mailto:hawk@tbi.univie.ac.at <mailto:hawk@tbi.univie.ac.at>>> wrote: Hi Darryl,

I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described.

My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup.

I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all.

Did you move the host one to maintenance before shutting down? Or is this a crash recovery test?

To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host.

Thanks for the feedback!

My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to users@ovirt.org <mailto:users@ovirt.org><mailto:users@ovirt.org <mailto:users@ovirt.org>> about that.

This is weird. Martin, Simone can you please investigate on this?

Cheers Richard

On 04/08/2016 01:38 AM, Bond, Darryl wrote: > There seems to be a pretty severe bug with using hosted engine on gluster. > > If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back.

is this an Hyperconverged setup?

> > This is regardless of which host the engine was currently running. > > > The issue seems to be buried in the bowels of VDSM and is not an issue with gluster itself.

Sahina, can you please investigate on this?

> > The gluster filesystem is still accessable from the host that was running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1). > > > Can my hosted engine be converted to use NFS (using the gluster NFS server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)?

> > What effect would that have on the hosted-engine storage domain inside oVirt, ie would the same filesystem be mounted twice or would it just break. > > > Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS? > > > Darryl > > > > > ________________________________ > > The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company. > _______________________________________________ > Users mailing list > Users@ovirt.org <mailto:Users@ovirt.org><mailto:Users@ovirt.org <mailto:Users@ovirt.org>> >http://lists.ovirt.org/mailman/listinfo/users >

-- /dev/null

_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org><mailto:Users@ovirt.org <mailto:Users@ovirt.org>> http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com <http://redhat.com><http://redhat.com>

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com <http://redhat.com>

--------------070809080602030103050300 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">On 04/12/2016 01:33 PM, Sandro Bonazzola wrote: </div> <blockquote cite="mid:CAPQRNT=_7QBQHvhC5UHoD4vBg9WUcdSRvg2uLn0sPct-=gb4FA@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote">On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <<a moz-do-not-send="true" href="mailto:dbond@nrggos.com.au" target="_blank">dbond@nrggos.com.au</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">My setup is hyperconverged. I have placed my test results in <a moz-do-not-send="true" href="https://bugzilla.redhat.com/show_bug.cgi?id=1298693" rel="noreferrer" target="_blank"><a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1298693">https://bugzilla.redhat.com/show_bug.cgi?id=1298693</a></a> </blockquote> <div> </div> <div>Ok, so you're aware about the limitation of the single point of failure. If you drop the host referenced in hosted engine configuration for the initial setup it won't be able to connect to shared storage even if the other hosts in the cluster are up since the entry point is down.</div> <div>Note that hyperconverged deployment is not supported in 3.6.</div> </div> </div> </div> </blockquote> This issue does not seem related to the single point of failure. Tested this on a 3 node setup with each node mounting the volume hosting HE as localhost:/engine. Since all nodes have glusterd running and belong to same cluster, with any one node down - mount should continue to work. But HE VM is restarted once a node is powered off. broker.log : Thread-4602::ERROR::2016-04-13 18:50:28,249::listener::192::ovirt_hosted_engine_ha.broker.list ener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemB ackend dom_type=glusterfs sd_uuid=7fe3707b-2435-4e71-b831-4daba08cc72c' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", lin e 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 456, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain 7fe3707b-2435-4e71-b831-4daba08cc72c not found in /rhev/data-center/mnt/glusterSD agent.log MainThread::<a class="moz-txt-link-freetext" href="INFO::2016-04-13">INFO::2016-04-13</a> 18:50:26,020::storage_server::207::ovirt_hosted_engine_ha.lib.sto rage_server.StorageServer::(connect_storage_server) Connecting storage server MainThread::<a class="moz-txt-link-freetext" href="INFO::2016-04-13">INFO::2016-04-13</a> 18:50:28,054::hosted_engine::807::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor for 7fe3707b-2435-4e71-b831-4daba08cc72c MainThread::<a class="moz-txt-link-freetext" href="INFO::2016-04-13">INFO::2016-04-13</a> 18:50:28,055::image::184::ovirt_hosted_engine_ha.lib.image.Image::(teardown_images) Teardown images MainThread::WARNING::2016-04-13 18:50:28,177::hosted_engine::675::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Disconnecting the storage MainThread::<a class="moz-txt-link-freetext" href="INFO::2016-04-13">INFO::2016-04-13</a> 18:50:28,177::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server) Disconnecting storage server The gluster mount logs for this time frame contain unmount messages [2016-04-13 13:20:28.199429] I [fuse-bridge.c:4997:fuse_thread_proc] 0-fuse: unmounting /rhev/ data-center/mnt/glusterSD/localhost:_engine [2016-04-13 13:20:28.199934] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0 (+0x7dc5) [0x7ff9b3ceddc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff9b53588b5] - ->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7ff9b5358739] ) 0-: received signum (15), shut ting down [2016-04-13 13:20:28.199970] I [fuse-bridge.c:5704:fini] 0-fuse: Unmounting '/rhev/data-center /mnt/glusterSD/localhost:_engine'. <blockquote cite="mid:CAPQRNT=_7QBQHvhC5UHoD4vBg9WUcdSRvg2uLn0sPct-=gb4FA@mail.gmail.com" type="cite"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"> <div> </div> <div> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Short description of setup: 3 hosts with 2 disks each set up with gluster replica 3 across the 6 disks volume name hosted-engine. Hostname hosted-storage configured in /etc//hosts to point to the host1. Installed hosted engine on host1 with the hosted engine storage path = hosted-storage:/hosted-engine Install first engine on h1 successful. Hosts h2 and h3 added to the hosted engine. All works fine. Additional storage and non-hosted engine hosts added etc. Additional VMs added to hosted-engine storage (oVirt Reports VM and Cinder VM). Additional VM's are hosted by other storage - cinder and NFS. The system is in production. Engine can be migrated around with the web interface. - 3.6.4 upgrade released, follow the upgrade guide, engine is upgraded first , new Centos kernel requires host reboot. - Engine placed on h2 - h3 into maintenance (local) upgrade and Reboot h3 - No issues - Local maintenance removed from h3. - Engine placed on h3 - h2 into maintenance (local) upgrade and Reboot h2 - No issues - Local maintenance removed from h2. - Engine placed on h3 -h1 into mainteance (local) upgrade and reboot h1 - engine crashes and does not start elsewhere, VM(cinder) on h3 on same gluster volume pauses. - Host 1 takes about 5 minutes to reboot (Enterprise box with all it's normal BIOS probing) - Engine starts after h1 comes back and stabilises - VM(cinder) unpauses itself, VM(reports) continued fine the whole time. I can do no diagnosis on the 2 VMs as the engine is not available. - Local maintenance removed from h1 I don't believe the issue is with gluster itself as the volume remains accessible on all hosts during this time albeit with a missing server (gluster volume status) as each gluster server is rebooted. Gluster was upgraded as part of the process, no issues were seen here. I have been able to duplicate the issue without the upgrade by following the same sort of timeline. ________________________________ From: Sandro Bonazzola <<a moz-do-not-send="true" href="mailto:sbonazzo@redhat.com">sbonazzo@redhat.com</a>> Sent: Monday, 11 April 2016 7:11 PM To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak; Sahina Bose Cc: Bond, Darryl; users Subject: Re: [ovirt-users] Hosted engine on gluster problem On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck <<a moz-do-not-send="true" href="mailto:hawk@tbi.univie.ac.at"><a class="moz-txt-link-abbreviated" href="mailto:hawk@tbi.univie.ac.at">hawk@tbi.univie.ac.at</a></a><mailto:<a moz-do-not-send="true" href="mailto:hawk@tbi.univie.ac.at"><a class="moz-txt-link-abbreviated" href="mailto:hawk@tbi.univie.ac.at">hawk@tbi.univie.ac.at</a></a>>> wrote: Hi Darryl, I'm still experimenting with my oVirt installation so I tried to recreate the problems you've described. My setup has three HA hosts for virtualization and three machines for the gluster replica 3 setup. I manually migrated the Engine from the initial install host (one) to host three. Then shut down host one manually and interrupted the fencing mechanisms so the host stayed down. This didn't bother the Engine VM at all. Did you move the host one to maintenance before shutting down? Or is this a crash recovery test? To make things a bit more challenging I then shut down host three while running the Engine VM. Of course the Engine was down for some time until host two detected the problem. It started the Engine VM and everything seems to be running quite well without the initial install host. Thanks for the feedback! My only problem is that the HA agent on host two and three refuse to start after a reboot due to the fact that the configuration of the hosted engine is missing. I wrote another mail to <a moz-do-not-send="true" href="mailto:users@ovirt.org"><a class="moz-txt-link-abbreviated" href="mailto:users@ovirt.org">users@ovirt.org</a></a><mailto:<a moz-do-not-send="true" href="mailto:users@ovirt.org"><a class="moz-txt-link-abbreviated" href="mailto:users@ovirt.org">users@ovirt.org</a></a>> about that. This is weird. Martin, Simone can you please investigate on this? Cheers Richard On 04/08/2016 01:38 AM, Bond, Darryl wrote: > There seems to be a pretty severe bug with using hosted engine on gluster. > > If the host that was used as the initial hosted-engine --deploy host goes away, the engine VM wil crash and cannot be restarted until the host comes back. is this an Hyperconverged setup? > > This is regardless of which host the engine was currently running. > > > The issue seems to be buried in the bowels of VDSM and is not an issue with gluster itself. Sahina, can you please investigate on this? > > The gluster filesystem is still accessable from the host that was running the engine. The issue has been submitted to bugzilla but the fix is some way off (4.1). > > > Can my hosted engine be converted to use NFS (using the gluster NFS server on the same filesystem) without rebuilding my hosted engine (ie change domainType=glusterfs to domainType=nfs)? > > What effect would that have on the hosted-engine storage domain inside oVirt, ie would the same filesystem be mounted twice or would it just break. > > > Will this actually fix the problem, does it have the same issue when the hosted engine is on NFS? > > > Darryl > > > > > ________________________________ > > The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company. > _______________________________________________ > Users mailing list > <a moz-do-not-send="true" href="mailto:Users@ovirt.org">Users@ovirt.org</a><mailto:<a moz-do-not-send="true" href="mailto:Users@ovirt.org"><a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a></a>> > <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a> > -- /dev/null _______________________________________________ Users mailing list <a moz-do-not-send="true" href="mailto:Users@ovirt.org">Users@ovirt.org</a><mailto:<a moz-do-not-send="true" href="mailto:Users@ovirt.org"><a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a></a>> <a moz-do-not-send="true" href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a> -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at <a moz-do-not-send="true" href="http://redhat.com" rel="noreferrer" target="_blank">redhat.com</a><<a moz-do-not-send="true" href="http://redhat.com" rel="noreferrer" target="_blank"><a class="moz-txt-link-freetext" href="http://redhat.com">http://redhat.com</a></a>> </blockquote> </div> <div> </div> -- <div class="gmail_signature"> <div dir="ltr"> <div> <div dir="ltr">Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at <a moz-do-not-send="true" href="http://redhat.com" target="_blank">redhat.com</a> </div> </div> </div> </div> </div> </div> </blockquote> </body> </html> --------------070809080602030103050300--

3265

Age (days ago)

3409

Last active (days ago)

List overview

Download

20 comments

8 participants

participants (8)

Bond, Darryl
David Gossage
Luiz Claudio Prazeres Goncalves
Nir Soffer
Richard Neuboeck
Sahina Bose
Sandro Bonazzola
Simone Tiraboschi