------=_Part_29483_11651952.1366120165705
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Yes, fencing must be working otherwise HA does not work. So in the case of =
a power supply failure we have to have a server with a redundant power supp=
ly to previse this scenario?=20
----- Original Message -----
From: "Ren=C3=A9 Koch" <r.koch(a)ovido.at>=20
To: suporte(a)logicworks.pt, "Gianluca Cecchi"
<gianluca.cecchi(a)gmail.com>=20
Cc: "users" <Users(a)ovirt.org>=20
Sent: Ter=C3=A7a-feira, 16 de Abril de 2013 13:31:48=20
Subject: RE: [Users] High Availability=20
-----Original message-----=20
From:suporte@logicworks.pt <suporte(a)logicworks.pt>=20
Sent: Tuesday 16th April 2013 14:03=20
To: Gianluca Cecchi <gianluca.cecchi(a)gmail.com>=20
Cc: Ren=C3=A9 Koch <r.koch(a)ovido.at>; users <Users(a)ovirt.org>=20
Subject: Re: [Users] High Availability=20
=20
Well, we also disconnected the ilo NIC cable. We did another test, and ju=
st
disconnected the NIC cables but the ilo NIC cable, and voil=C3=A1 the HA=
took about 3 minutes to migrate the VM to the other host. We notice too th=
at the manager did a reboot to the failed host. For a more real scenario we=
disconnected the power cable from the host and after about 2 or 3 minutes =
the manager put the host in non-responsive and the VM in unknown state. Is =
this the correct behavior?=20
Fencing means that the non-responsive host gets reseted (powered off and on=
).=20
If fencing isn't working (as you disconnected the power cable and so ILO ca=
n't send you a success message) the vms want get started on another host.=
=20
In your example this seems to be strange, but lets have a look at the follo=
wing scenario:=20
- You have 2 datacenters with 1 hypervisor in DC 1 and 1 hypervisor in DC 2=
, ovirt-engine is running in DC 1=20
- Connection between dcs is lost=20
- Fencing isn't working=20
- VM is running on host in DC 2=20
- If VM would start on host in DC 1 without successful fencing your vm disk=
would be broken (host in DC 2 and DC 1 is writing on the same storage file=
)=20
Maybe there are better examples then this one (would be interesting to know=
what your storage metro-cluster is doing in this scenario with this split-=
brain-situation), but I hope it's clear to you why fencing is working as it=
is and what can happen if it would be less restrictive...=20
Regards,=20
Ren=C3=A9=20
=20
Regards=20
Jose=20
=20
----- Mensagem original -----=20
De: "Gianluca Cecchi" <gianluca.cecchi(a)gmail.com>=20
Para: suporte(a)logicworks.pt=20
Cc: "Ren=C3=A9 Koch (ovido)" <r.koch(a)ovido.at>, "users"
<Users(a)ovirt.org>=
=20
Enviadas: Ter=C3=A7a-feira, 16 Abril, 2013 12:12:43=20
Assunto: Re: [Users] High Availability=20
=20
On Tue, Apr 16, 2013 at 12:56 PM, suporte wrote:=20
> Hi,=20
>=20
> We have 2 Fujitsu servers and one iSCSI storage domain. The servers hav=
e the
power management configured with ilo3.=20
> We can live migrate a VM and when rebooting the host of that VM
it does=
the migration to the other host.=20
>=20
> For testing high availability we disconnected all NIC cables of the VM =
host,
the VM does not migrate to the other host, we had to manually confirm=
the host has been rebooted, and than migration happens.=20
>=20
> Is this the correct behavior? We have to manually confirm that the host=
has
been rebooted for HA happens?=20
>=20
> Regards=20
> Jose=20
=20
Hello,=20
when you say "we disconnected all NIC cables" you mean "we=20
disconnected all NIC cables but the ones connected to the iLO=20
interface", correct?=20
Because to know that one host has successfully fenced the problematic=20
one, it has to send a get status message and see that it is off or=20
that it has been successfully rebooted.....=20
=20
For esxample in RHCS if you configure iLO as a fencing device it=20
remains indefinitely in state similar to=20
=20
wait for fence to complete=20
=20
if the "fencer" is not able to get an acknowledge about the operation=20
or to reach the other node iLO.=20
Probably you can find something in your logs...=20
=20
Gianluca=20
=20
------=_Part_29483_11651952.1366120165705
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<html><head><style type=3D'text/css'>p { margin: 0;
}</style></head><body><=
div style=3D'font-family: arial,helvetica,sans-serif; font-size: 10pt; colo=
r: #000000'>Yes, fencing must be working otherwise HA does not work. So in =
the case of a power supply failure we have to have a server with a redundan=
t power supply to previse this scenario?<br><br><hr
id=3D"zwchr"><div style=
=3D"color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-deco=
ration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b=
From: </b>"Ren=C3=A9 Koch"
&lt;r.koch(a)ovido.at&gt;<br><b>To: </b>suporte@l=
ogicworks.pt, "Gianluca Cecchi"
<gianluca.cecchi@gmail.com><br><b>Cc:=
</b>"users" &lt;Users(a)ovirt.org&gt;<br><b>Sent:
</b>Ter=C3=A7a-feira, 16 d=
e Abril de 2013 13:31:48<br><b>Subject: </b>RE: [Users] High
Availability<b=
r><br><br> <br>-----Original message-----<br>>
From:suporte@logicwo=
rks.pt &lt;suporte(a)logicworks.pt&gt;<br>&gt; Sent: Tuesday 16th April
2013 =
14:03<br>> To: Gianluca Cecchi
&lt;gianluca.cecchi(a)gmail.com&gt;<br>&gt;=
Cc: Ren=C3=A9 Koch &lt;r.koch(a)ovido.at&gt;; users
&lt;Users(a)ovirt.org&gt;<=
br>> Subject: Re: [Users] High Availability<br>> <br>>
Well, we al=
so disconnected the ilo NIC cable. We did another test, and just disconnect=
ed the NIC cables but the ilo NIC cable, and voil=C3=A1 the HA took about 3=
minutes to migrate the VM to the other host. We notice too that the manage=
r did a reboot to the failed host. For a more real scenario we disconnected=
the power cable from the host and after about 2 or 3 minutes the manager p=
ut the host in non-responsive and the VM in unknown state. Is this the corr=
ect behavior?<br><br><br>Fencing means that the non-responsive host gets
re=
seted (powered off and on).<br>If fencing isn't working (as you disconnecte=
d the power cable and so ILO can't send you a success message) the vms want=
get started on another host.<br>In your example this seems to be strange, =
but lets have a look at the following scenario:<br>- You have 2 datacenters=
with 1 hypervisor in DC 1 and 1 hypervisor in DC 2, ovirt-engine is runnin=
g in DC 1<br>- Connection between dcs is lost<br>- Fencing isn't
working<br=
- VM is running on host in DC 2<br>- If VM would start on host
in DC 1 wit=
hout successful fencing your vm disk would be broken (host in DC 2 and
DC 1=
is writing on the same storage file)<br><br>Maybe there are better example=
s then this one (would be interesting to know what your storage metro-clust=
er is doing in this scenario with this split-brain-situation), but I hope i=
t's clear to you why fencing is working as it is and what can happen if it =
would be less
restrictive...<br><br><br>Regards,<br>Ren=C3=A9<br><br><br>&g=
t; <br>> Regards<br>> Jose<br>> <br>>
----- Mensagem original -=
----<br>> De: "Gianluca Cecchi"
&lt;gianluca.cecchi(a)gmail.com&gt;<br>&gt=
; Para: suporte(a)logicworks.pt<br>&gt; Cc: "Ren=C3=A9 Koch (ovido)"
<r.ko=
ch(a)ovido.at&gt;, "users" &lt;Users(a)ovirt.org&gt;<br>&gt;
Enviadas: Ter=C3=
=A7a-feira, 16 Abril, 2013 12:12:43<br>> Assunto: Re: [Users] High Avail=
ability<br>> <br>> On Tue, Apr 16, 2013 at 12:56 PM,
suporte wr=
ote:<br>> > Hi,<br>> ><br>> >
We have 2 Fujitsu servers a=
nd one iSCSI storage domain. The servers have the power management configur=
ed with ilo3.<br>> > We can live migrate a VM and when rebooting the
=
host of that VM it does the migration to the other host.<br>>
><br>&g=
t; > For testing high availability we disconnected all NIC cables of the=
VM host, the VM does not migrate to the other host, we had to manually con=
firm the host has been rebooted, and than migration happens.<br>>
><b=
r>> > Is this the correct behavior? We have to manually confirm that =
the host has been rebooted for HA happens?<br>> ><br>>
> Regard=
s<br>> > Jose<br>> <br>>
Hello,<br>> when you say "we discon=
nected all NIC cables" you mean "we<br>> disconnected all NIC
cables but=
the ones connected to the iLO<br>> interface",
correct?<br>> Because=
to know that one host has successfully fenced the problematic<br>> one,=
it has to send a get status message and see that it is off or<br>> that=
it has been successfully rebooted.....<br>> <br>> For esxample
in RH=
CS if you configure iLO as a fencing device it<br>> remains indefinitely=
in state similar to<br>> <br>> wait for fence to
complete<br>> <b=
r>> if the "fencer" is not able to get an acknowledge about the
operatio=
n<br>> or to reach the other node iLO.<br>> Probably you can
find som=
ething in your logs...<br>> <br>> Gianluca<br>>
<br></div><br></di=
v></body></html>
------=_Part_29483_11651952.1366120165705--