Re: [Users] hosted engine help

11 Mar 2014

      --_ec9179c8-4028-4998-b90d-aff1b175a8eb_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
...
Date: Tue=2C 11 Mar 2014 15:16:36 +0100
From: sbonazzo@redhat.com
To: giuseppe.ragusa@hotmail.com=3B jbrooks@redhat.com=3B msivak@redhat.co=
m
CC: users@ovirt.org=3B fsimonce@redhat.com=3B gpadgett@redhat.com
Subject: Re: [Users] hosted engine help
=20
Il 10/03/2014 22:32=2C Giuseppe Ragusa ha scritto:
...
Hi all=2C
=20
...
Date: Mon=2C 10 Mar 2014 12:56:19 -0400
From: jbrooks@redhat.com
To: msivak@redhat.com
CC: users@ovirt.org
Subject: Re: [Users] hosted engine help
----- Original Message -----
...
From: "Martin Sivak" <msivak@redhat.com>
To: "Dan Kenigsberg" <danken@redhat.com>
Cc: users@ovirt.org
Sent: Saturday=2C March 8=2C 2014 11:52:59 PM
Subject: Re: [Users] hosted engine help
Hi Jason=2C
can you please attach the full logs? We had very similar issue befor=
e I we
need to see if is the same or not.
I may have to recreate it -- I switched back to an all in one engine a=
fter my
setup started refusing to run the engine at all. It's no fun losing yo=
ur engine!
This was a migrated-from-standalone setup=2C maybe that caused additio=
nal wrinkles...
Jason
...
Thanks
=20
I experienced the exact same symptoms as Jason on a from-scratch instal=
lation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt
3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Glus=
ter-provided NFS as storage for the self-hosted engine VM only).
=20
Using GlusterFS with hosted-engine storage is not supported and not recom=
mended.
HA daemon may not work properly there.
If it is unsupported (and particularly "not recommended") even with the int=
erposed NFS (the native Gluster-provided NFSv3 export of a volume)=2C then =
which is the recommended way to setup a fault-tolerant load-balanced 2 node=
 oVirt cluster (without external dedicated SAN/NAS)?
...
...
I roughly followed the guide from Andrew Lau:
=20
http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/
=20
with some variations due to newer packages (resolved bugs) and differen=
t hardware setup (no VLANs in my setup: physically separated networks=3B cu=
stom
second nic added to Engine VM template before deploying etc.)
=20
The self-hosted installation on first node + Engine VM (configured for =
managing both oVirt and the storage=3B Datacenter default set to NFS becaus=
e no
GlusterFS offered) went apparently smooth=2C but the HA-agent failed to=
 start at the very end (same errors in logs as Jason: the storage domain se=
ems
"missing") and I was only able to start it all manually with:
=20
hosted-engine --connect-storage
hosted-engine --start-pool
=20
The above commands are used for development and shouldn't be used for sta=
rting the engine.
Directly starting the engine (with the command below) failed because of sto=
rage unavailability=2C so I used the above "trick" as a "last resort" to at=
 least prove that the engine was able to start and had not been somewhat "d=
estroyed" or "lost" in the process (but I do understand that it is an extre=
me debug-only action).
...
...
hosted-engine --vm-start
=20
then the Engine came up and I could use it=2C I even registered the sec=
ond node (same final error in HA-agent) and tried to add GlusterFS storage
domains for further VMs and ISOs (by the way: the original NFS-GlusterF=
S domain for Engine VM only is not present inside the Engine web UI) but it
always failed activating the domains (they remain "Inactive").
=20
Furthermore the engine gets killed some time after starting (from 3 up =
to 11 hours later) and the only way to get it back is repeating the above c=
ommands.
=20
Need logs for this.
...
...
I always managed GlusterFS "natively" (not through oVirt) from the comm=
andline and verified that the NFS-exported Engine-VM-only volume gets
replicated=2C but I obviously failed to try migration because the HA pa=
rt results inactive and oVirt refuse to migrate the Engine.
=20
Since I tried many times=2C with variations and further manual actions =
between (like trying to manually mount the NFS Engine domain=2C restarting =
I will try to reproduce it all=2C but I can recall that on libvirt logs (Ho=
stedEngine.log) there was always clear indication of the PID that killed th=
e Engine VM and each time it belonged to an instance of sanlock.

the
...
...
HA-agent only etc.)=2C my logs are "cluttered"=2C so I should start fro=
m scratch again and pack up all logs in one swipe.
=20
+1
=3B>
...
...
Tell me what I should capture and at which points in the whole process =
and I will try to follow up as soon as possible.
=20
What:
hosted-engine-setup=2C hosted-engine-ha=2C vdsm=2C libvirt=2C sanlock fro=
m the physical hosts and engine and server logs from the hosted engine VM.
=20
When:
As soon as you see an error.
If the setup design (wholly GlusterFS based) is somewhat flawed=2C please p=
oint me to some hint/docs/guide for the right way of setting it up on 2 sta=
ndalone physical nodes=2C so as not to waste your time in chasing "defects"=
 in something that is not supposed to be working anyway.

I will follow your advice and try it accordingly.

Many thanks again=2C
Giuseppe
...
...
Many thanks=2C
Giuseppe
=20
...
...
--
Martin Siv=E1k
msivak@redhat.com
Red Hat Czech
RHEV-M SLA / Brno=2C CZ
----- Original Message -----
...
On Fri=2C Mar 07=2C 2014 at 10:17:43AM +0100=2C Sandro Bonazzola w=
rote:
...
Il 07/03/2014 01:10=2C Jason Brooks ha scritto:
> Hey everyone=2C I've been testing out oVirt 3.4 w/ hosted engi=
ne=2C and
> while I've managed to bring the engine up=2C I've only been ab=
le to do it
> manually=2C using "hosted-engine --vm-start".
>
> The ovirt-ha-agent service fails reliably for me=2C erroring o=
ut with
> "RequestError: Request failed: success."
>
> I've pasted error passages from the ha agent and vdsm logs bel=
ow.
>
> Any pointers?
looks like a VDSM bug=2C Dan?
Why? The exception is raised from deep inside the ovirt_hosted_eng=
ine_ha
code.
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
=20
=20

Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
=20
=20
=20
--=20
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
        	   		  =
...
>=3B >=3B>=3B<br>>=3B >=3B>=3B<br>>=3B >=3B>=3B ----- Or=
iginal Message -----<br>>=3B >=3B>=3B >=3B From: "Martin Sivak" <=
=3Bmsivak@redhat.com>=3B<br>>=3B >=3B>=3B >=3B To: "Dan Kenigsber=
g" <=3Bdanken@redhat.com>=3B<br>>=3B >=3B>=3B >=3B Cc: users@ov=
irt.org<br>>=3B >=3B>=3B >=3B Sent: Saturday=2C March 8=2C 2014 11:=
52:59 PM<br>>=3B >=3B>=3B >=3B Subject: Re: [Users] hosted engine h=
elp<br>>=3B >=3B>=3B >=3B<br>>=3B >=3B>=3B >=3B Hi Jason=2C=
<br>>=3B >=3B>=3B >=3B<br>>=3B >=3B>=3B >=3B can you please=
 attach the full logs? We had very similar issue before I we<br>>=3B >=
=3B>=3B >=3B need to see if is the same or not.<br>>=3B >=3B>=3B<=
br>>=3B >=3B>=3B I may have to recreate it -- I switched back to an a=
ll in one engine after my<br>>=3B >=3B>=3B setup started refusing to =
run the engine at all. It's no fun losing your engine!<br>>=3B >=3B>=
=3B<br>>=3B >=3B>=3B This was a migrated-from-standalone setup=2C may=
be that caused additional wrinkles...<br>>=3B >=3B>=3B<br>>=3B >=
=3B>=3B Jason<br>>=3B >=3B>=3B<br>>=3B >=3B>=3B >=3B<br>>=
=3B >=3B>=3B >=3B Thanks<br>>=3B >=3B <br>>=3B >=3B I experie=
nced the exact same symptoms as Jason on a from-scratch installation on two=
--_ec9179c8-4028-4998-b90d-aff1b175a8eb_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px=3B
padding:0px
}
body.hmmessage
{
font-size: 12pt=3B
font-family:Calibri
}
--></style></head>
<body class=3D'hmmessage'><div dir=3D'ltr'>>=3B Date: Tue=2C 11 Mar 2014 =
15:16:36 +0100<br><div>>=3B From: sbonazzo@redhat.com<br>>=3B To: giuse=
ppe.ragusa@hotmail.com=3B jbrooks@redhat.com=3B msivak@redhat.com<br>>=3B=
 CC: users@ovirt.org=3B fsimonce@redhat.com=3B gpadgett@redhat.com<br>>=
=3B Subject: Re: [Users] hosted engine help<br>>=3B <br>>=3B Il 10/03/2=
014 22:32=2C Giuseppe Ragusa ha scritto:<br>>=3B >=3B Hi all=2C<br>>=
=3B >=3B <br>>=3B >=3B>=3B Date: Mon=2C 10 Mar 2014 12:56:19 -0400<=
br>>=3B >=3B>=3B From: jbrooks@redhat.com<br>>=3B >=3B>=3B To: =
msivak@redhat.com<br>>=3B >=3B>=3B CC: users@ovirt.org<br>>=3B >=
=3B>=3B Subject: Re: [Users] hosted engine help<br>>=3B >=3B>=3B<br=
 physical nodes with CentOS 6.5 (fully up-to-date) using oVirt<br>>=3B &g=
t=3B 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Glu=
ster-provided NFS as storage for the self-hosted engine VM only).<br>>=3B=
 <br>>=3B Using GlusterFS with hosted-engine storage is not supported and=
 not recommended.<br>>=3B HA daemon may not work properly there.<br><br>I=
f it is unsupported (and particularly "not recommended") even with the inte=
rposed NFS (the native Gluster-provided NFSv3 export of a volume)=2C then w=
hich is the recommended way to setup a fault-tolerant load-balanced 2 node =
oVirt cluster (without external dedicated SAN/NAS)?<br><br>>=3B >=3B I =
roughly followed the guide from Andrew Lau:<br>>=3B >=3B <br>>=3B >=
=3B http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/<br>&g=
t=3B >=3B <br>>=3B >=3B with some variations due to newer packages (r=
esolved bugs) and different hardware setup (no VLANs in my setup: physicall=
y separated networks=3B custom<br>>=3B >=3B second nic added to Engine =
VM template before deploying etc.)<br>>=3B >=3B <br>>=3B >=3B The s=
elf-hosted installation on first node + Engine VM (configured for managing =
both oVirt and the storage=3B Datacenter default set to NFS because no<br>&=
gt=3B >=3B GlusterFS offered) went apparently smooth=2C but the HA-agent =
failed to start at the very end (same errors in logs as Jason: the storage =
domain seems<br>>=3B >=3B "missing") and I was only able to start it al=
l manually with:<br>>=3B >=3B <br>>=3B >=3B hosted-engine --connect=
-storage<br>>=3B >=3B hosted-engine --start-pool<br>>=3B <br>>=3B T=
he above commands are used for development and shouldn't be used for starti=
ng the engine.<br><br>Directly starting the engine (with the command below)=
 failed because of storage unavailability=2C so I used the above "trick" as=
 a "last resort" to at least prove that the engine was able to start and ha=
d not been somewhat "destroyed" or "lost" in the process (but I do understa=
nd that it is an extreme debug-only action).<br><br>>=3B >=3B hosted-en=
gine --vm-start<br>>=3B >=3B <br>>=3B >=3B then the Engine came up =
and I could use it=2C I even registered the second node (same final error i=
n HA-agent) and tried to add GlusterFS storage<br>>=3B >=3B domains for=
 further VMs and ISOs (by the way: the original NFS-GlusterFS domain for En=
gine VM only is not present inside the Engine web UI) but it<br>>=3B >=
=3B always failed activating the domains (they remain "Inactive").<br>>=
=3B >=3B <br>>=3B >=3B Furthermore the engine gets killed some time a=
fter starting (from 3 up to 11 hours later) and the only way to get it back=
 is repeating the above commands.<br>>=3B <br>>=3B Need logs for this.<=
br><br>I will try to reproduce it all=2C but I can recall that on libvirt l=
ogs (HostedEngine.log) there was always clear indication of the PID that ki=
lled the Engine VM and each time it belonged to an instance of sanlock.<br>=
<br>>=3B >=3B I always managed GlusterFS "natively" (not through oVirt)=
 from the commandline and verified that the NFS-exported Engine-VM-only vol=
ume gets<br>>=3B >=3B replicated=2C but I obviously failed to try migra=
tion because the HA part results inactive and oVirt refuse to migrate the E=
ngine.<br>>=3B >=3B <br>>=3B >=3B Since I tried many times=2C with =
variations and further manual actions between (like trying to manually moun=
t the NFS Engine domain=2C restarting the<br>>=3B >=3B HA-agent only et=
c.)=2C my logs are "cluttered"=2C so I should start from scratch again and =
pack up all logs in one swipe.<br>>=3B <br>>=3B +1<br><br>=3B>=3B<br>=
<br>>=3B >=3B Tell me what I should capture and at which points in the =
whole process and I will try to follow up as soon as possible.<br>>=3B <b=
r>>=3B What:<br>>=3B hosted-engine-setup=2C hosted-engine-ha=2C vdsm=2C=
 libvirt=2C sanlock from the physical hosts and engine and server logs from=
 the hosted engine VM.<br>>=3B <br>>=3B When:<br>>=3B As soon as you =
see an error.<br><br>If the setup design (wholly GlusterFS based) is somewh=
at flawed=2C please point me to some hint/docs/guide for the right way of s=
etting it up on 2 standalone physical nodes=2C so as not to waste your time=
 in chasing "defects" in something that is not supposed to be working anywa=
y.<br><br>I will follow your advice and try it accordingly.<br><br>Many tha=
nks again=2C<br>Giuseppe<br><br>>=3B >=3B Many thanks=2C<br>>=3B >=
=3B Giuseppe<br>>=3B >=3B <br>>=3B >=3B>=3B >=3B --<br>>=3B &=
gt=3B>=3B >=3B Martin Siv=E1k<br>>=3B >=3B>=3B >=3B msivak@redh=
at.com<br>>=3B >=3B>=3B >=3B Red Hat Czech<br>>=3B >=3B>=3B &=
gt=3B RHEV-M SLA / Brno=2C CZ<br>>=3B >=3B>=3B >=3B<br>>=3B >=
=3B>=3B >=3B ----- Original Message -----<br>>=3B >=3B>=3B >=3B=
 >=3B On Fri=2C Mar 07=2C 2014 at 10:17:43AM +0100=2C Sandro Bonazzola wr=
ote:<br>>=3B >=3B>=3B >=3B >=3B >=3B Il 07/03/2014 01:10=2C Jas=
on Brooks ha scritto:<br>>=3B >=3B>=3B >=3B >=3B >=3B >=3B He=
y everyone=2C I've been testing out oVirt 3.4 w/ hosted engine=2C and<br>&g=
t=3B >=3B>=3B >=3B >=3B >=3B >=3B while I've managed to bring t=
he engine up=2C I've only been able to do it<br>>=3B >=3B>=3B >=3B =
>=3B >=3B >=3B manually=2C using "hosted-engine --vm-start".<br>>=
=3B >=3B>=3B >=3B >=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B =
>=3B >=3B >=3B The ovirt-ha-agent service fails reliably for me=2C er=
roring out with<br>>=3B >=3B>=3B >=3B >=3B >=3B >=3B "Request=
Error: Request failed: success."<br>>=3B >=3B>=3B >=3B >=3B >=
=3B >=3B<br>>=3B >=3B>=3B >=3B >=3B >=3B >=3B I've pasted e=
rror passages from the ha agent and vdsm logs below.<br>>=3B >=3B>=3B=
 >=3B >=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B >=3B >=3B &g=
t=3B Any pointers?<br>>=3B >=3B>=3B >=3B >=3B >=3B<br>>=3B &g=
t=3B>=3B >=3B >=3B >=3B looks like a VDSM bug=2C Dan?<br>>=3B >=
=3B>=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B >=3B Why? The excep=
tion is raised from deep inside the ovirt_hosted_engine_ha<br>>=3B >=3B=
>=3B >=3B >=3B code.<br>>=3B >=3B>=3B >=3B >=3B ___________=
____________________________________<br>>=3B >=3B>=3B >=3B >=3B U=
sers mailing list<br>>=3B >=3B>=3B >=3B >=3B Users@ovirt.org<br>&=
gt=3B >=3B>=3B >=3B >=3B http://lists.ovirt.org/mailman/listinfo/us=
ers<br>>=3B >=3B>=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B ____=
___________________________________________<br>>=3B >=3B>=3B >=3B U=
sers mailing list<br>>=3B >=3B>=3B >=3B Users@ovirt.org<br>>=3B &=
gt=3B>=3B >=3B http://lists.ovirt.org/mailman/listinfo/users<br>>=3B =
>=3B>=3B >=3B<br>>=3B >=3B>=3B ________________________________=
_______________<br>>=3B >=3B>=3B Users mailing list<br>>=3B >=3B&=
gt=3B Users@ovirt.org<br>>=3B >=3B>=3B http://lists.ovirt.org/mailman=
/listinfo/users<br>>=3B >=3B <br>>=3B >=3B <br>>=3B >=3B ______=
_________________________________________<br>>=3B >=3B Users mailing li=
st<br>>=3B >=3B Users@ovirt.org<br>>=3B >=3B http://lists.ovirt.org=
/mailman/listinfo/users<br>>=3B >=3B <br>>=3B <br>>=3B <br>>=3B -=
- <br>>=3B Sandro Bonazzola<br>>=3B Better technology. Faster innovatio=
n. Powered by community collaboration.<br>>=3B See how it works at redhat=
.com<br></div> 		 	   		  </div></body>
</html>=

--_ec9179c8-4028-4998-b90d-aff1b175a8eb_--