--_ec9179c8-4028-4998-b90d-aff1b175a8eb_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Tue=2C 11 Mar 2014 15:16:36 +0100
From: sbonazzo(a)redhat.com
To: giuseppe.ragusa(a)hotmail.com=3B jbrooks(a)redhat.com=3B msivak(a)redhat.co=
m
CC: users(a)ovirt.org=3B fsimonce(a)redhat.com=3B gpadgett(a)redhat.com
Subject: Re: [Users] hosted engine help
=20
Il 10/03/2014 22:32=2C Giuseppe Ragusa ha scritto:
> Hi all=2C
>=20
>> Date: Mon=2C 10 Mar 2014 12:56:19 -0400
>> From: jbrooks(a)redhat.com
>> To: msivak(a)redhat.com
>> CC: users(a)ovirt.org
>> Subject: Re: [Users] hosted engine help
>>
>>
>>
>> ----- Original Message -----
>> > From: "Martin Sivak" <msivak(a)redhat.com>
>> > To: "Dan Kenigsberg" <danken(a)redhat.com>
>> > Cc: users(a)ovirt.org
>> > Sent: Saturday=2C March 8=2C 2014 11:52:59 PM
>> > Subject: Re: [Users] hosted engine help
>> >
>> > Hi Jason=2C
>> >
>> > can you please attach the full logs? We had very similar issue befor=
e I we
>> > need to see if is the same or not.
>>
>> I may have to recreate it -- I switched back to an all in one engine a=
fter my
>> setup started refusing to run the engine at all. It's no
fun losing yo=
ur engine!
>>
>> This was a migrated-from-standalone setup=2C maybe that caused additio=
nal wrinkles...
>>
>> Jason
>>
>> >
>> > Thanks
>=20
> I experienced the exact same symptoms as Jason on a from-scratch instal=
lation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt
> 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3
(with Glus=
ter-provided NFS as storage for the self-hosted engine VM only).
=20
Using GlusterFS with hosted-engine storage is not supported and not recom=
mended.
HA daemon may not work properly there.
If it is unsupported (and particularly "not recommended") even with the int=
erposed NFS (the native Gluster-provided NFSv3 export of a volume)=2C then =
which is the recommended way to setup a fault-tolerant load-balanced 2 node=
oVirt cluster (without external dedicated SAN/NAS)?
> I roughly followed the guide from Andrew Lau:
>=20
>
http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/
>=20
> with some variations due to newer packages (resolved bugs) and differen=
t
hardware setup (no VLANs in my setup: physically separated networks=3B cu=
stom
> second nic added to Engine VM template before deploying etc.)
>=20
> The self-hosted installation on first node + Engine VM (configured for =
managing both oVirt and the storage=3B Datacenter default set to NFS becaus=
e no
> GlusterFS offered) went apparently smooth=2C but the HA-agent
failed to=
start at the very end (same errors in logs as Jason: the storage domain
se=
ems
> "missing") and I was only able to start it all
manually with:
>=20
> hosted-engine --connect-storage
> hosted-engine --start-pool
=20
The above commands are used for development and shouldn't be used for sta=
rting the engine.
Directly starting the engine (with the command below) failed because of sto=
rage unavailability=2C so I used the above "trick" as a "last resort"
to at=
least prove that the engine was able to start and had not been somewhat "d=
estroyed" or "lost" in the process (but I do understand that it is an
extre=
me debug-only action).
> hosted-engine --vm-start
>=20
> then the Engine came up and I could use it=2C I even registered the sec=
ond
node (same final error in HA-agent) and tried to add GlusterFS storage
> domains for further VMs and ISOs (by the way: the original
NFS-GlusterF=
S domain for Engine VM only is not present inside the Engine web UI)
but it
> always failed activating the domains (they remain
"Inactive").
>=20
> Furthermore the engine gets killed some time after starting (from 3 up =
to 11
hours later) and the only way to get it back is repeating the above c=
ommands.
=20
Need logs for this.
I will try to reproduce it all=2C but I can recall that on libvirt logs (Ho=
stedEngine.log) there was always clear indication of the PID that killed th=
e Engine VM and each time it belonged to an instance of sanlock.
> I always managed GlusterFS "natively" (not through
oVirt) from the comm=
andline and verified that the NFS-exported Engine-VM-only
volume gets
> replicated=2C but I obviously failed to try migration because
the HA pa=
rt results inactive and oVirt refuse to migrate the Engine.
>=20
> Since I tried many times=2C with variations and further manual actions =
between (like trying to manually mount the NFS Engine domain=2C restarting =
the
> HA-agent only etc.)=2C my logs are "cluttered"=2C so I
should start fro=
m scratch again and pack up all logs in one swipe.
=20
+1
=3B>
> Tell me what I should capture and at which points in the whole
process =
and I will try to follow up as soon as possible.
=20
What:
hosted-engine-setup=2C hosted-engine-ha=2C vdsm=2C libvirt=2C sanlock fro=
m the
physical hosts and engine and server logs from the hosted engine VM.
=20
When:
As soon as you see an error.
If the setup design (wholly GlusterFS based) is somewhat flawed=2C please p=
oint me to some hint/docs/guide for the right way of setting it up on 2 sta=
ndalone physical nodes=2C so as not to waste your time in chasing "defects"=
in something that is not supposed to be working anyway.
I will follow your advice and try it accordingly.
Many thanks again=2C
Giuseppe
> Many thanks=2C
> Giuseppe
>=20
>> > --
>> > Martin Siv=E1k
>> > msivak(a)redhat.com
>> > Red Hat Czech
>> > RHEV-M SLA / Brno=2C CZ
>> >
>> > ----- Original Message -----
>> > > On Fri=2C Mar 07=2C 2014 at 10:17:43AM +0100=2C Sandro Bonazzola w=
rote:
>> > > > Il 07/03/2014 01:10=2C Jason Brooks ha
scritto:
>> > > > > Hey everyone=2C I've been testing out oVirt 3.4 w/
hosted engi=
ne=2C and
>> > > > > while I've managed to bring the
engine up=2C I've only been ab=
le to do it
>> > > > > manually=2C using "hosted-engine
--vm-start".
>> > > > >
>> > > > > The ovirt-ha-agent service fails reliably for me=2C erroring
o=
ut with
>> > > > > "RequestError: Request failed:
success."
>> > > > >
>> > > > > I've pasted error passages from the ha agent and vdsm
logs bel=
ow.
>> > > > >
>> > > > > Any pointers?
>> > > >
>> > > > looks like a VDSM bug=2C Dan?
>> > >
>> > > Why? The exception is raised from deep inside the ovirt_hosted_eng=
ine_ha
>> > > code.
>> > > _______________________________________________
>> > > Users mailing list
>> > > Users(a)ovirt.org
>> > >
http://lists.ovirt.org/mailman/listinfo/users
>> > >
>> > _______________________________________________
>> > Users mailing list
>> > Users(a)ovirt.org
>> >
http://lists.ovirt.org/mailman/listinfo/users
>> >
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
>=20
>=20
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>=20
=20
=20
--=20
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at
redhat.com =
--_ec9179c8-4028-4998-b90d-aff1b175a8eb_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<style><!--
.hmmessage P
{
margin:0px=3B
padding:0px
}
body.hmmessage
{
font-size: 12pt=3B
font-family:Calibri
}
--></style></head>
<body class=3D'hmmessage'><div dir=3D'ltr'>>=3B Date:
Tue=2C 11 Mar 2014 =
15:16:36 +0100<br><div>>=3B From:
sbonazzo(a)redhat.com<br>&gt=3B To: giuse=
ppe.ragusa(a)hotmail.com=3B jbrooks(a)redhat.com=3B msivak(a)redhat.com<br>&gt=3B=
CC: users(a)ovirt.org=3B fsimonce(a)redhat.com=3B gpadgett(a)redhat.com<br>&gt=
=3B Subject: Re: [Users] hosted engine help<br>>=3B <br>>=3B Il
10/03/2=
014 22:32=2C Giuseppe Ragusa ha scritto:<br>>=3B >=3B Hi
all=2C<br>>=
=3B >=3B <br>>=3B >=3B>=3B Date: Mon=2C 10 Mar 2014 12:56:19
-0400<=
br>>=3B >=3B>=3B From: jbrooks(a)redhat.com<br>&gt=3B
>=3B>=3B To: =
msivak(a)redhat.com<br>&gt=3B >=3B>=3B CC:
users(a)ovirt.org<br>&gt=3B >=
=3B>=3B Subject: Re: [Users] hosted engine help<br>>=3B
>=3B>=3B<br=
>=3B >=3B>=3B<br>>=3B
>=3B>=3B<br>>=3B >=3B>=3B ----- Or=
iginal
Message -----<br>>=3B >=3B>=3B >=3B From: "Martin
Sivak" <=
=3Bmsivak(a)redhat.com&gt=3B<br>&gt=3B >=3B>=3B >=3B To:
"Dan Kenigsber=
g" &lt=3Bdanken(a)redhat.com&gt=3B<br>&gt=3B >=3B>=3B
>=3B Cc: users@ov=
irt.org<br>>=3B >=3B>=3B >=3B Sent: Saturday=2C March 8=2C
2014 11:=
52:59 PM<br>>=3B >=3B>=3B >=3B Subject: Re: [Users] hosted
engine h=
elp<br>>=3B >=3B>=3B >=3B<br>>=3B
>=3B>=3B >=3B Hi Jason=2C=
<br>>=3B >=3B>=3B >=3B<br>>=3B
>=3B>=3B >=3B can you please=
attach the full logs? We had very similar issue before I we<br>>=3B >=
=3B>=3B >=3B need to see if is the same or not.<br>>=3B
>=3B>=3B<=
br>>=3B >=3B>=3B I may have to recreate it -- I switched back to an
a=
ll in one engine after my<br>>=3B >=3B>=3B setup started refusing
to =
run the engine at all. It's no fun losing your engine!<br>>=3B
>=3B>=
=3B<br>>=3B >=3B>=3B This was a migrated-from-standalone setup=2C
may=
be that caused additional wrinkles...<br>>=3B
>=3B>=3B<br>>=3B >=
=3B>=3B Jason<br>>=3B >=3B>=3B<br>>=3B
>=3B>=3B >=3B<br>>=
=3B >=3B>=3B >=3B Thanks<br>>=3B >=3B
<br>>=3B >=3B I experie=
nced the exact same symptoms as Jason on a from-scratch installation on two=
physical nodes with CentOS 6.5 (fully up-to-date) using oVirt<br>>=3B
&g=
t=3B 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Glu=
ster-provided NFS as storage for the self-hosted engine VM only).<br>>=3B=
<br>>=3B Using GlusterFS with hosted-engine storage is not supported and=
not recommended.<br>>=3B HA daemon may not work properly
there.<br><br>I=
f it is unsupported (and particularly "not recommended") even with the inte=
rposed NFS (the native Gluster-provided NFSv3 export of a volume)=2C then w=
hich is the recommended way to setup a fault-tolerant load-balanced 2 node =
oVirt cluster (without external dedicated SAN/NAS)?<br><br>>=3B
>=3B I =
roughly followed the guide from Andrew Lau:<br>>=3B >=3B
<br>>=3B >=
=3B
http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/<br&g...
t=3B >=3B <br>>=3B >=3B with some variations due to newer packages
(r=
esolved bugs) and different hardware setup (no VLANs in my setup: physicall=
y separated networks=3B custom<br>>=3B >=3B second nic added to Engine
=
VM template before deploying etc.)<br>>=3B >=3B <br>>=3B
>=3B The s=
elf-hosted installation on first node + Engine VM (configured for managing =
both oVirt and the storage=3B Datacenter default set to NFS because no<br>&=
gt=3B >=3B GlusterFS offered) went apparently smooth=2C but the HA-agent =
failed to start at the very end (same errors in logs as Jason: the storage =
domain seems<br>>=3B >=3B "missing") and I was only able to
start it al=
l manually with:<br>>=3B >=3B <br>>=3B >=3B
hosted-engine --connect=
-storage<br>>=3B >=3B hosted-engine --start-pool<br>>=3B
<br>>=3B T=
he above commands are used for development and shouldn't be used for starti=
ng the engine.<br><br>Directly starting the engine (with the command below)=
failed because of storage unavailability=2C so I used the above "trick" as=
a "last resort" to at least prove that the engine was able to start and ha=
d not been somewhat "destroyed" or "lost" in the process (but I do
understa=
nd that it is an extreme debug-only action).<br><br>>=3B >=3B
hosted-en=
gine --vm-start<br>>=3B >=3B <br>>=3B >=3B then the
Engine came up =
and I could use it=2C I even registered the second node (same final error i=
n HA-agent) and tried to add GlusterFS storage<br>>=3B >=3B domains
for=
further VMs and ISOs (by the way: the original NFS-GlusterFS domain for En=
gine VM only is not present inside the Engine web UI) but it<br>>=3B >=
=3B always failed activating the domains (they remain
"Inactive").<br>>=
=3B >=3B <br>>=3B >=3B Furthermore the engine gets killed some
time a=
fter starting (from 3 up to 11 hours later) and the only way to get it back=
is repeating the above commands.<br>>=3B <br>>=3B Need logs for
this.<=
br><br>I will try to reproduce it all=2C but I can recall that on libvirt l=
ogs (HostedEngine.log) there was always clear indication of the PID that ki=
lled the Engine VM and each time it belonged to an instance of sanlock.<br>=
<br>>=3B >=3B I always managed GlusterFS "natively" (not
through oVirt)=
from the commandline and verified that the NFS-exported Engine-VM-only vol=
ume gets<br>>=3B >=3B replicated=2C but I obviously failed to try
migra=
tion because the HA part results inactive and oVirt refuse to migrate the E=
ngine.<br>>=3B >=3B <br>>=3B >=3B Since I tried many
times=2C with =
variations and further manual actions between (like trying to manually moun=
t the NFS Engine domain=2C restarting the<br>>=3B >=3B HA-agent only
et=
c.)=2C my logs are "cluttered"=2C so I should start from scratch again and =
pack up all logs in one swipe.<br>>=3B <br>>=3B
+1<br><br>=3B>=3B<br>=
<br>>=3B >=3B Tell me what I should capture and at which points in the
=
whole process and I will try to follow up as soon as possible.<br>>=3B <b=
r>>=3B What:<br>>=3B hosted-engine-setup=2C hosted-engine-ha=2C
vdsm=2C=
libvirt=2C sanlock from the physical hosts and engine and server logs from=
the hosted engine VM.<br>>=3B <br>>=3B When:<br>>=3B
As soon as you =
see an error.<br><br>If the setup design (wholly GlusterFS based) is somewh=
at flawed=2C please point me to some hint/docs/guide for the right way of s=
etting it up on 2 standalone physical nodes=2C so as not to waste your time=
in chasing "defects" in something that is not supposed to be working anywa=
y.<br><br>I will follow your advice and try it
accordingly.<br><br>Many tha=
nks again=2C<br>Giuseppe<br><br>>=3B >=3B Many
thanks=2C<br>>=3B >=
=3B Giuseppe<br>>=3B >=3B <br>>=3B >=3B>=3B
>=3B --<br>>=3B &=
gt=3B>=3B >=3B Martin Siv=E1k<br>>=3B >=3B>=3B
>=3B msivak@redh=
at.com<br>>=3B >=3B>=3B >=3B Red Hat
Czech<br>>=3B >=3B>=3B &=
gt=3B RHEV-M SLA / Brno=2C CZ<br>>=3B >=3B>=3B
>=3B<br>>=3B >=
=3B>=3B >=3B ----- Original Message -----<br>>=3B
>=3B>=3B >=3B=
>=3B On Fri=2C Mar 07=2C 2014 at 10:17:43AM +0100=2C Sandro Bonazzola wr=
ote:<br>>=3B >=3B>=3B >=3B >=3B >=3B Il
07/03/2014 01:10=2C Jas=
on Brooks ha scritto:<br>>=3B >=3B>=3B >=3B >=3B
>=3B >=3B He=
y everyone=2C I've been testing out oVirt 3.4 w/ hosted engine=2C
and<br>&g=
t=3B >=3B>=3B >=3B >=3B >=3B >=3B while I've
managed to bring t=
he engine up=2C I've only been able to do it<br>>=3B >=3B>=3B
>=3B =
>=3B >=3B >=3B manually=2C using "hosted-engine
--vm-start".<br>>=
=3B >=3B>=3B >=3B >=3B >=3B >=3B<br>>=3B
>=3B>=3B >=3B =
>=3B >=3B >=3B The ovirt-ha-agent service fails reliably for me=2C er=
roring out with<br>>=3B >=3B>=3B >=3B >=3B >=3B
>=3B "Request=
Error: Request failed: success."<br>>=3B >=3B>=3B >=3B
>=3B >=
=3B >=3B<br>>=3B >=3B>=3B >=3B >=3B >=3B
>=3B I've pasted e=
rror passages from the ha agent and vdsm logs below.<br>>=3B
>=3B>=3B=
>=3B >=3B >=3B >=3B<br>>=3B >=3B>=3B
>=3B >=3B >=3B &g=
t=3B Any pointers?<br>>=3B >=3B>=3B >=3B >=3B
>=3B<br>>=3B &g=
t=3B>=3B >=3B >=3B >=3B looks like a VDSM bug=2C
Dan?<br>>=3B >=
=3B>=3B >=3B >=3B<br>>=3B >=3B>=3B >=3B
>=3B Why? The excep=
tion is raised from deep inside the ovirt_hosted_engine_ha<br>>=3B
>=3B=
>=3B >=3B >=3B code.<br>>=3B >=3B>=3B >=3B
>=3B ___________=
____________________________________<br>>=3B >=3B>=3B >=3B
>=3B U=
sers mailing list<br>>=3B >=3B>=3B >=3B >=3B
Users(a)ovirt.org<br>&=
gt=3B >=3B>=3B >=3B >=3B
http://lists.ovirt.org/mailman/listinfo/us=
ers<br>>=3B >=3B>=3B >=3B >=3B<br>>=3B
>=3B>=3B >=3B ____=
___________________________________________<br>>=3B >=3B>=3B
>=3B U=
sers mailing list<br>>=3B >=3B>=3B >=3B
Users(a)ovirt.org<br>&gt=3B &=
gt=3B>=3B >=3B
http://lists.ovirt.org/mailman/listinfo/users<br>>=3B =
>=3B>=3B >=3B<br>>=3B >=3B>=3B
________________________________=
_______________<br>>=3B >=3B>=3B Users mailing
list<br>>=3B >=3B&=
gt=3B Users(a)ovirt.org<br>&gt=3B >=3B>=3B
http://lists.ovirt.org/mailman=
/listinfo/users<br>>=3B >=3B <br>>=3B >=3B
<br>>=3B >=3B ______=
_________________________________________<br>>=3B >=3B Users mailing
li=
st<br>>=3B >=3B Users(a)ovirt.org<br>&gt=3B >=3B
http://lists.ovirt.org=
/mailman/listinfo/users<br>>=3B >=3B <br>>=3B
<br>>=3B <br>>=3B -=
- <br>>=3B Sandro Bonazzola<br>>=3B Better technology. Faster
innovatio=
n. Powered by community collaboration.<br>>=3B See how it works at redhat=
.com<br></div> </div></body>
</html>=
--_ec9179c8-4028-4998-b90d-aff1b175a8eb_--