
--_ec9179c8-4028-4998-b90d-aff1b175a8eb_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Date: Tue=2C 11 Mar 2014 15:16:36 +0100 From: sbonazzo@redhat.com To: giuseppe.ragusa@hotmail.com=3B jbrooks@redhat.com=3B msivak@redhat.co= m CC: users@ovirt.org=3B fsimonce@redhat.com=3B gpadgett@redhat.com Subject: Re: [Users] hosted engine help =20 Il 10/03/2014 22:32=2C Giuseppe Ragusa ha scritto:
Hi all=2C =20
Date: Mon=2C 10 Mar 2014 12:56:19 -0400 From: jbrooks@redhat.com To: msivak@redhat.com CC: users@ovirt.org Subject: Re: [Users] hosted engine help
----- Original Message -----
From: "Martin Sivak" <msivak@redhat.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: users@ovirt.org Sent: Saturday=2C March 8=2C 2014 11:52:59 PM Subject: Re: [Users] hosted engine help
Hi Jason=2C
can you please attach the full logs? We had very similar issue befor= e I we need to see if is the same or not.
I may have to recreate it -- I switched back to an all in one engine a= fter my setup started refusing to run the engine at all. It's no fun losing yo= ur engine!
This was a migrated-from-standalone setup=2C maybe that caused additio= nal wrinkles...
Jason
Thanks
=20 I experienced the exact same symptoms as Jason on a from-scratch instal= lation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Glus= ter-provided NFS as storage for the self-hosted engine VM only). =20 Using GlusterFS with hosted-engine storage is not supported and not recom= mended. HA daemon may not work properly there.
If it is unsupported (and particularly "not recommended") even with the int= erposed NFS (the native Gluster-provided NFSv3 export of a volume)=2C then = which is the recommended way to setup a fault-tolerant load-balanced 2 node= oVirt cluster (without external dedicated SAN/NAS)?
I roughly followed the guide from Andrew Lau: =20 http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/ =20 with some variations due to newer packages (resolved bugs) and differen= t hardware setup (no VLANs in my setup: physically separated networks=3B cu= stom second nic added to Engine VM template before deploying etc.) =20 The self-hosted installation on first node + Engine VM (configured for = managing both oVirt and the storage=3B Datacenter default set to NFS becaus= e no GlusterFS offered) went apparently smooth=2C but the HA-agent failed to= start at the very end (same errors in logs as Jason: the storage domain se= ems "missing") and I was only able to start it all manually with: =20 hosted-engine --connect-storage hosted-engine --start-pool =20 The above commands are used for development and shouldn't be used for sta= rting the engine.
Directly starting the engine (with the command below) failed because of sto= rage unavailability=2C so I used the above "trick" as a "last resort" to at= least prove that the engine was able to start and had not been somewhat "d= estroyed" or "lost" in the process (but I do understand that it is an extre= me debug-only action).
hosted-engine --vm-start =20 then the Engine came up and I could use it=2C I even registered the sec= ond node (same final error in HA-agent) and tried to add GlusterFS storage domains for further VMs and ISOs (by the way: the original NFS-GlusterF= S domain for Engine VM only is not present inside the Engine web UI) but it always failed activating the domains (they remain "Inactive"). =20 Furthermore the engine gets killed some time after starting (from 3 up = to 11 hours later) and the only way to get it back is repeating the above c= ommands. =20 Need logs for this.
I always managed GlusterFS "natively" (not through oVirt) from the comm= andline and verified that the NFS-exported Engine-VM-only volume gets replicated=2C but I obviously failed to try migration because the HA pa= rt results inactive and oVirt refuse to migrate the Engine. =20 Since I tried many times=2C with variations and further manual actions = between (like trying to manually mount the NFS Engine domain=2C restarting =
I will try to reproduce it all=2C but I can recall that on libvirt logs (Ho= stedEngine.log) there was always clear indication of the PID that killed th= e Engine VM and each time it belonged to an instance of sanlock. the
HA-agent only etc.)=2C my logs are "cluttered"=2C so I should start fro= m scratch again and pack up all logs in one swipe. =20 +1
=3B>
Tell me what I should capture and at which points in the whole process = and I will try to follow up as soon as possible. =20 What: hosted-engine-setup=2C hosted-engine-ha=2C vdsm=2C libvirt=2C sanlock fro= m the physical hosts and engine and server logs from the hosted engine VM. =20 When: As soon as you see an error.
If the setup design (wholly GlusterFS based) is somewhat flawed=2C please p= oint me to some hint/docs/guide for the right way of setting it up on 2 sta= ndalone physical nodes=2C so as not to waste your time in chasing "defects"= in something that is not supposed to be working anyway. I will follow your advice and try it accordingly. Many thanks again=2C Giuseppe
Many thanks=2C Giuseppe =20
-- Martin Siv=E1k msivak@redhat.com Red Hat Czech RHEV-M SLA / Brno=2C CZ
----- Original Message -----
On Fri=2C Mar 07=2C 2014 at 10:17:43AM +0100=2C Sandro Bonazzola w= rote:
Il 07/03/2014 01:10=2C Jason Brooks ha scritto: > Hey everyone=2C I've been testing out oVirt 3.4 w/ hosted engi= ne=2C and > while I've managed to bring the engine up=2C I've only been ab= le to do it > manually=2C using "hosted-engine --vm-start". > > The ovirt-ha-agent service fails reliably for me=2C erroring o= ut with > "RequestError: Request failed: success." > > I've pasted error passages from the ha agent and vdsm logs bel= ow. > > Any pointers?
looks like a VDSM bug=2C Dan?
Why? The exception is raised from deep inside the ovirt_hosted_eng= ine_ha code. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users =20 =20
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users =20 =20 =20 --=20 Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com =
>=3B >=3B>=3B<br>>=3B >=3B>=3B<br>>=3B >=3B>=3B ----- Or= iginal Message -----<br>>=3B >=3B>=3B >=3B From: "Martin Sivak" <= =3Bmsivak@redhat.com>=3B<br>>=3B >=3B>=3B >=3B To: "Dan Kenigsber= g" <=3Bdanken@redhat.com>=3B<br>>=3B >=3B>=3B >=3B Cc: users@ov= irt.org<br>>=3B >=3B>=3B >=3B Sent: Saturday=2C March 8=2C 2014 11:= 52:59 PM<br>>=3B >=3B>=3B >=3B Subject: Re: [Users] hosted engine h= elp<br>>=3B >=3B>=3B >=3B<br>>=3B >=3B>=3B >=3B Hi Jason=2C= <br>>=3B >=3B>=3B >=3B<br>>=3B >=3B>=3B >=3B can you please= attach the full logs? We had very similar issue before I we<br>>=3B >= =3B>=3B >=3B need to see if is the same or not.<br>>=3B >=3B>=3B<= br>>=3B >=3B>=3B I may have to recreate it -- I switched back to an a= ll in one engine after my<br>>=3B >=3B>=3B setup started refusing to = run the engine at all. It's no fun losing your engine!<br>>=3B >=3B>= =3B<br>>=3B >=3B>=3B This was a migrated-from-standalone setup=2C may= be that caused additional wrinkles...<br>>=3B >=3B>=3B<br>>=3B >= =3B>=3B Jason<br>>=3B >=3B>=3B<br>>=3B >=3B>=3B >=3B<br>>= =3B >=3B>=3B >=3B Thanks<br>>=3B >=3B <br>>=3B >=3B I experie= nced the exact same symptoms as Jason on a from-scratch installation on two=
--_ec9179c8-4028-4998-b90d-aff1b175a8eb_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <style><!-- .hmmessage P { margin:0px=3B padding:0px } body.hmmessage { font-size: 12pt=3B font-family:Calibri } --></style></head> <body class=3D'hmmessage'><div dir=3D'ltr'>>=3B Date: Tue=2C 11 Mar 2014 = 15:16:36 +0100<br><div>>=3B From: sbonazzo@redhat.com<br>>=3B To: giuse= ppe.ragusa@hotmail.com=3B jbrooks@redhat.com=3B msivak@redhat.com<br>>=3B= CC: users@ovirt.org=3B fsimonce@redhat.com=3B gpadgett@redhat.com<br>>= =3B Subject: Re: [Users] hosted engine help<br>>=3B <br>>=3B Il 10/03/2= 014 22:32=2C Giuseppe Ragusa ha scritto:<br>>=3B >=3B Hi all=2C<br>>= =3B >=3B <br>>=3B >=3B>=3B Date: Mon=2C 10 Mar 2014 12:56:19 -0400<= br>>=3B >=3B>=3B From: jbrooks@redhat.com<br>>=3B >=3B>=3B To: = msivak@redhat.com<br>>=3B >=3B>=3B CC: users@ovirt.org<br>>=3B >= =3B>=3B Subject: Re: [Users] hosted engine help<br>>=3B >=3B>=3B<br= physical nodes with CentOS 6.5 (fully up-to-date) using oVirt<br>>=3B &g= t=3B 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Glu= ster-provided NFS as storage for the self-hosted engine VM only).<br>>=3B= <br>>=3B Using GlusterFS with hosted-engine storage is not supported and= not recommended.<br>>=3B HA daemon may not work properly there.<br><br>I= f it is unsupported (and particularly "not recommended") even with the inte= rposed NFS (the native Gluster-provided NFSv3 export of a volume)=2C then w= hich is the recommended way to setup a fault-tolerant load-balanced 2 node = oVirt cluster (without external dedicated SAN/NAS)?<br><br>>=3B >=3B I = roughly followed the guide from Andrew Lau:<br>>=3B >=3B <br>>=3B >= =3B http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/<br>&g= t=3B >=3B <br>>=3B >=3B with some variations due to newer packages (r= esolved bugs) and different hardware setup (no VLANs in my setup: physicall= y separated networks=3B custom<br>>=3B >=3B second nic added to Engine = VM template before deploying etc.)<br>>=3B >=3B <br>>=3B >=3B The s= elf-hosted installation on first node + Engine VM (configured for managing = both oVirt and the storage=3B Datacenter default set to NFS because no<br>&= gt=3B >=3B GlusterFS offered) went apparently smooth=2C but the HA-agent = failed to start at the very end (same errors in logs as Jason: the storage = domain seems<br>>=3B >=3B "missing") and I was only able to start it al= l manually with:<br>>=3B >=3B <br>>=3B >=3B hosted-engine --connect= -storage<br>>=3B >=3B hosted-engine --start-pool<br>>=3B <br>>=3B T= he above commands are used for development and shouldn't be used for starti= ng the engine.<br><br>Directly starting the engine (with the command below)= failed because of storage unavailability=2C so I used the above "trick" as= a "last resort" to at least prove that the engine was able to start and ha= d not been somewhat "destroyed" or "lost" in the process (but I do understa= nd that it is an extreme debug-only action).<br><br>>=3B >=3B hosted-en= gine --vm-start<br>>=3B >=3B <br>>=3B >=3B then the Engine came up = and I could use it=2C I even registered the second node (same final error i= n HA-agent) and tried to add GlusterFS storage<br>>=3B >=3B domains for= further VMs and ISOs (by the way: the original NFS-GlusterFS domain for En= gine VM only is not present inside the Engine web UI) but it<br>>=3B >= =3B always failed activating the domains (they remain "Inactive").<br>>= =3B >=3B <br>>=3B >=3B Furthermore the engine gets killed some time a= fter starting (from 3 up to 11 hours later) and the only way to get it back= is repeating the above commands.<br>>=3B <br>>=3B Need logs for this.<= br><br>I will try to reproduce it all=2C but I can recall that on libvirt l= ogs (HostedEngine.log) there was always clear indication of the PID that ki= lled the Engine VM and each time it belonged to an instance of sanlock.<br>= <br>>=3B >=3B I always managed GlusterFS "natively" (not through oVirt)= from the commandline and verified that the NFS-exported Engine-VM-only vol= ume gets<br>>=3B >=3B replicated=2C but I obviously failed to try migra= tion because the HA part results inactive and oVirt refuse to migrate the E= ngine.<br>>=3B >=3B <br>>=3B >=3B Since I tried many times=2C with = variations and further manual actions between (like trying to manually moun= t the NFS Engine domain=2C restarting the<br>>=3B >=3B HA-agent only et= c.)=2C my logs are "cluttered"=2C so I should start from scratch again and = pack up all logs in one swipe.<br>>=3B <br>>=3B +1<br><br>=3B>=3B<br>= <br>>=3B >=3B Tell me what I should capture and at which points in the = whole process and I will try to follow up as soon as possible.<br>>=3B <b= r>>=3B What:<br>>=3B hosted-engine-setup=2C hosted-engine-ha=2C vdsm=2C= libvirt=2C sanlock from the physical hosts and engine and server logs from= the hosted engine VM.<br>>=3B <br>>=3B When:<br>>=3B As soon as you = see an error.<br><br>If the setup design (wholly GlusterFS based) is somewh= at flawed=2C please point me to some hint/docs/guide for the right way of s= etting it up on 2 standalone physical nodes=2C so as not to waste your time= in chasing "defects" in something that is not supposed to be working anywa= y.<br><br>I will follow your advice and try it accordingly.<br><br>Many tha= nks again=2C<br>Giuseppe<br><br>>=3B >=3B Many thanks=2C<br>>=3B >= =3B Giuseppe<br>>=3B >=3B <br>>=3B >=3B>=3B >=3B --<br>>=3B &= gt=3B>=3B >=3B Martin Siv=E1k<br>>=3B >=3B>=3B >=3B msivak@redh= at.com<br>>=3B >=3B>=3B >=3B Red Hat Czech<br>>=3B >=3B>=3B &= gt=3B RHEV-M SLA / Brno=2C CZ<br>>=3B >=3B>=3B >=3B<br>>=3B >= =3B>=3B >=3B ----- Original Message -----<br>>=3B >=3B>=3B >=3B= >=3B On Fri=2C Mar 07=2C 2014 at 10:17:43AM +0100=2C Sandro Bonazzola wr= ote:<br>>=3B >=3B>=3B >=3B >=3B >=3B Il 07/03/2014 01:10=2C Jas= on Brooks ha scritto:<br>>=3B >=3B>=3B >=3B >=3B >=3B >=3B He= y everyone=2C I've been testing out oVirt 3.4 w/ hosted engine=2C and<br>&g= t=3B >=3B>=3B >=3B >=3B >=3B >=3B while I've managed to bring t= he engine up=2C I've only been able to do it<br>>=3B >=3B>=3B >=3B = >=3B >=3B >=3B manually=2C using "hosted-engine --vm-start".<br>>= =3B >=3B>=3B >=3B >=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B = >=3B >=3B >=3B The ovirt-ha-agent service fails reliably for me=2C er= roring out with<br>>=3B >=3B>=3B >=3B >=3B >=3B >=3B "Request= Error: Request failed: success."<br>>=3B >=3B>=3B >=3B >=3B >= =3B >=3B<br>>=3B >=3B>=3B >=3B >=3B >=3B >=3B I've pasted e= rror passages from the ha agent and vdsm logs below.<br>>=3B >=3B>=3B= >=3B >=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B >=3B >=3B &g= t=3B Any pointers?<br>>=3B >=3B>=3B >=3B >=3B >=3B<br>>=3B &g= t=3B>=3B >=3B >=3B >=3B looks like a VDSM bug=2C Dan?<br>>=3B >= =3B>=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B >=3B Why? The excep= tion is raised from deep inside the ovirt_hosted_engine_ha<br>>=3B >=3B= >=3B >=3B >=3B code.<br>>=3B >=3B>=3B >=3B >=3B ___________= ____________________________________<br>>=3B >=3B>=3B >=3B >=3B U= sers mailing list<br>>=3B >=3B>=3B >=3B >=3B Users@ovirt.org<br>&= gt=3B >=3B>=3B >=3B >=3B http://lists.ovirt.org/mailman/listinfo/us= ers<br>>=3B >=3B>=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B ____= ___________________________________________<br>>=3B >=3B>=3B >=3B U= sers mailing list<br>>=3B >=3B>=3B >=3B Users@ovirt.org<br>>=3B &= gt=3B>=3B >=3B http://lists.ovirt.org/mailman/listinfo/users<br>>=3B = >=3B>=3B >=3B<br>>=3B >=3B>=3B ________________________________= _______________<br>>=3B >=3B>=3B Users mailing list<br>>=3B >=3B&= gt=3B Users@ovirt.org<br>>=3B >=3B>=3B http://lists.ovirt.org/mailman= /listinfo/users<br>>=3B >=3B <br>>=3B >=3B <br>>=3B >=3B ______= _________________________________________<br>>=3B >=3B Users mailing li= st<br>>=3B >=3B Users@ovirt.org<br>>=3B >=3B http://lists.ovirt.org= /mailman/listinfo/users<br>>=3B >=3B <br>>=3B <br>>=3B <br>>=3B -= - <br>>=3B Sandro Bonazzola<br>>=3B Better technology. Faster innovatio= n. Powered by community collaboration.<br>>=3B See how it works at redhat= .com<br></div> </div></body> </html>= --_ec9179c8-4028-4998-b90d-aff1b175a8eb_--