------=_Part_3933742_65602238.1358067259763
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
----- Original Message -----
From: "Alexandru Vladulescu"
<avladulescu(a)bfproject.ro>
To: "Doron Fediuck" <dfediuck(a)redhat.com>
Cc: "users" <users(a)ovirt.org>
Sent: Sunday, January 13, 2013 10:46:41 AM
Subject: Re: [Users] Testing High Availability and Power outages
Dear Doron,
I haven't collected the logs from the tests, but I would gladly
re-do
the case and get back to you asap.
This feature is the main reason of which I have chosen to go with
Ovirt in the first place, besides other virt environments.
Could you please inform me what logs should I be focusing on,
besides
the engine log; vdsm maybe or other relevant logs?
Regards,
Alex
--
Sent from phone.
On 13.01.2013, at 09:56, Doron Fediuck < dfediuck(a)redhat.com >
wrote:
> ----- Original Message -----
> > From: "Alexandru Vladulescu" <
avladulescu(a)bfproject.ro >
>
> > To: "users" < users(a)ovirt.org >
>
> > Sent: Friday, January 11, 2013 2:47:38 PM
>
> > Subject: [Users] Testing High Availability and Power outages
>
> > Hi,
>
> > Today, I started testing on my Ovirt 3.1 installation
(from
> > dreyou
> > repos) running on 3 x Centos 6.3 hypervisors the High
> > Availability
> > features and the fence mechanism.
>
> > As yesterday, I have reported in a previous email thread,
that
> > the
> > migration priority queue cannot be increased (bug) in this
> > current
> > version, I decided to test what the official documentation says
> > about the High Availability cases.
>
> > This will be a disaster case scenarios to suffer from if
one
> > hypervisor has a power outage/hardware problem and the VMs
> > running
> > on it are not migrating on other spare resources.
>
> > In the official documenation from
ovirt.org it is quoted
the
> > following:
>
> > High availability
>
> > Allows critical VMs to be restarted on another host in the
event
> > of
> > hardware failure with three levels of priority, taking into
> > account
> > resiliency policy.
>
> > * Resiliency policy to control high availability VMs at
the
> > cluster
> > level.
>
> > * Supports application-level high availability with supported
> > fencing
> > agents.
>
> > As well as in the Architecture description:
>
> > High Availability - restart guest VMs from failed hosts
> > automatically
> > on other hosts
>
> > So the testing went like this -- One VM running a linux
box,
> > having
> > the check box "High Available" and "Priority for Run/Migration
> > queue:" set to Low. On Host we have the check box to "Any Host in
> > Cluster", without "Allow VM migration only upon Admin specific
> > request" checked.
>
> > My environment:
>
> > Configuration : 2 x Hypervisors (same cluster/hardware
> > configuration)
> > ; 1 x Hypervisor + acting as a NAS (NFS) server (different
> > cluster/hardware configuration)
>
> > Actions: Went and cut-off the power from one of the
hypervisors
> > from
> > the 2 node clusters, while the VM was running on. This would
> > translate to a power outage.
>
> > Results: The hypervisor node that suffered from the outage
is
> > showing
> > in Hosts tab as Non Responsive on Status, and the VM has a
> > question
> > mark and cannot be powered off or nothing (therefore it's stuck).
>
> > In the Log console in GUI, I get:
>
> > Host Hyper01 is non-responsive.
>
> > VM Web-Frontend01 was set to the Unknown status.
>
> > There is nothing I could I could do besides clicking on
the
> > Hyper01
> > "Confirm Host as been rebooted", afterwards the VM starts on the
> > Hyper02 with a cold reboot of the VM.
>
> > The Log console changes to:
>
> > Vm Web-Frontend01 was shut down due to Hyper01 host reboot
or
> > manual
> > fence
>
> > All VMs' status on Non-Responsive Host Hyper01 were changed to
> > 'Down'
> > by admin@internal
>
> > Manual fencing for host Hyper01 was started.
>
> > VM Web-Frontend01 was restarted on Host Hyper02
>
> > I would like you approach on this problem, reading the
> > documentation
> > & features pages on the official website, I suppose that this
> > would
> > have been an automatically mechanism working on some sort of a
> > vdsm
> > & engine fencing action. Am I missing something regarding it ?
>
> > Thank you for your patience reading this.
>
> > Regards,
>
> > Alex.
>
> > _______________________________________________
>
> > Users mailing list
>
> > Users(a)ovirt.org
>
> >
http://lists.ovirt.org/mailman/listinfo/users
>
> Hi Alex,
> Can you share with us the engine's log from the relevant time
> period?
> Doron
Hi Alex,
engine log is the important one, as it will indicate on the decision making process.
VDSM logs should be kept in case something is unclear, but I suggest we begin with
engine.log.
------=_Part_3933742_65602238.1358067259763
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<html><head><style type=3D'text/css'>p { margin: 0;
}</style></head><body><=
div style=3D'font-family: times new roman,new york,times,serif; font-size: =
12pt; color: #000000'><br><br><hr
id=3D"zwchr"><blockquote style=3D"border-=
left:2px solid rgb(16, 16, 255);margin-left:5px;padding-left:5px;color:#000=
;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helv=
etica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Alexandru
Vladulescu"=
&lt;avladulescu(a)bfproject.ro&gt;<br><b>To: </b>"Doron
Fediuck" <dfediuc=
k(a)redhat.com&gt;<br><b>Cc: </b>"users"
&lt;users(a)ovirt.org&gt;<br><b>Sent: =
</b>Sunday, January 13, 2013 10:46:41 AM<br><b>Subject: </b>Re:
[Users] Tes=
ting High Availability and Power outages<br><br><div>Dear
Doron,</div><div>=
<br></div><div>I haven't collected the logs from the tests, but I
would gla=
dly re-do the case and get back to you
asap. </div><div><br></div><div=
This feature is the main reason of which I have chosen to go with
Ovirt in=
the first place, besides other virt
environments.</div><div><br></div><div=
Could you please inform me what logs should I be focusing
on, <span c=
lass=3D"Apple-style-span"
style=3D"-webkit-tap-highlight-color: rgba(26, 26=
, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.2304=
69); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); ">besid=
es the engine log; vdsm maybe or other relevant
logs?</span></div><div><br>=
<div><div>Regards,</div><div>Alex</div></div><div><br></div><div><br></div>=
<div><span class=3D"Apple-style-span"
style=3D"-webkit-tap-highlight-color:=
rgba(26, 26, 26, 0.292969); -webkit-composition-fill-color: rgba(175, 192,=
227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.2304=
69);">--</span></div><div><span
class=3D"Apple-style-span" style=3D"-webkit=
-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-=
color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba=
(77, 128, 180, 0.230469); ">Sent from
phone.</span></div></div><div><br>On =
13.01.2013, at 09:56, Doron Fediuck <<a href=3D"mailto:dfediuck@redhat.c=
om" target=3D"_blank">dfediuck(a)redhat.com</a>&gt;
wrote:<br><br></div><div>=
</div><blockquote><div><div style=3D"font-family: times new
roman,new york,=
times,serif; font-size: 12pt; color: #000000"><br><br><hr
id=3D"zwchr"><blo=
ckquote style=3D"border-left:2px solid rgb(16, 16, 255);margin-left:5px;pad=
ding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decorati=
on:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From:
</=
b>"Alexandru Vladulescu" <<a
href=3D"mailto:avladulescu@bfproject.ro" ta=
rget=3D"_blank">avladulescu(a)bfproject.ro</a>&gt;<br><b>To:
</b>"users" <=
<a href=3D"mailto:users@ovirt.org"
target=3D"_blank">users(a)ovirt.org</a>&gt=
;<br><b>Sent: </b>Friday, January 11, 2013 2:47:38
PM<br><b>Subject: </b>[U=
sers] Testing High Availability and Power outages<br><br>
=20
=20
=20
=20
<br>
Hi,<br>
<br>
<br>
Today, I started testing on my Ovirt 3.1 installation (from dreyou
repos) running on 3 x Centos 6.3 hypervisors the High Availability
features and the fence mechanism.<br>
<br>
As yesterday, I have reported in a previous email thread, that the
migration priority queue cannot be increased (bug) in this current
version, I decided to test what the official documentation says
about the High Availability cases. <br>
<br>
This will be a disaster case scenarios to suffer from if one
hypervisor has a power outage/hardware problem and the VMs running
on it are not migrating on other spare resources.<br>
<br>
<br>
In the official documenation from <a href=3D"http://ovirt.org" target=
=3D"_blank">ovirt.org</a> it is quoted the
following:<br>
<h3> <span class=3D"mw-headline"
id=3D"High_availability"> <font color=
=3D"#333399"><i><small>High availability
</small></i></font></span></h3>
<font color=3D"#333399"><i><small>
</small></i></font>
<p><font color=3D"#333399"><i><small>Allows critical
VMs to be
restarted on another host in the event of hardware failure
with three levels of priority, taking into account
resiliency policy.
</small></i></font></p>
<font color=3D"#333399"><i><small>
</small></i></font>
<ul>
<li><font color=3D"#333399"><i><small> Resiliency
policy to control
high availability VMs at the cluster level.
</small></i></font></li>
<li><font color=3D"#333399"><i><small> Supports
application-level
high availability with supported fencing agents.
</small></i></font></li>
</ul>
<br>
As well as in the Architecture description:<br>
<font color=3D"#333399"><br>
<small><i>High Availability - restart guest VMs from failed hosts
automatically on other hosts</i></small></font><br>
<br>
<br>
<br>
So the testing went like this -- One VM running a linux box, having
the check box "High Available" and "Priority for Run/Migration
queue:" set to Low. On Host we have the check box to "Any Host in
Cluster", without "Allow VM migration only upon Admin specific
request" checked.<br>
<br>
<br>
<br>
My environment:<br>
<br>
<br>
Configuration : 2 x Hypervisors (same cluster/hardware
configuration) ; 1 x Hypervisor + acting as a NAS (NFS) server
(different cluster/hardware configuration)<br>
<br>
Actions: Went and cut-off the power from one of the hypervisors from
the 2 node clusters, while the VM was running on. This would
translate to a power outage.<br>
<br>
Results: The hypervisor node that suffered from the outage is
showing in Hosts tab as Non Responsive on Status, and the VM has a
question mark and cannot be powered off or nothing (therefore it's
stuck).<br>
<br>
In the Log console in GUI, I get: <br>
<br>
=20
=20
<span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode
MS', Arial, sans-serif; font-size: small; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing: normal;
line-height: 26px; orphans: 2; text-align: start; text-indent:
0px; text-transform: none; white-space: nowrap; widows: 2;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
102); display: inline !important; float: none; ">Host Hyper01 is
non-responsive.</span><br>
=20
<span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode
MS', Arial, sans-serif; font-size: small; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing: normal;
line-height: 26px; orphans: 2; text-align: start; text-indent:
0px; text-transform: none; white-space: nowrap; widows: 2;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
102); display: inline !important; float: none; ">VM Web-Frontend01
was set to the Unknown status.</span><br>
=20
<br>
There is nothing I could I could do besides clicking on the Hyper01
"Confirm Host as been rebooted", afterwards the VM starts on the
Hyper02 with a cold reboot of the VM.<br>
<br>
The Log console changes to:<br>
<br>
=20
<span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode
MS', Arial, sans-serif; font-size: small; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing: normal;
line-height: 26px; orphans: 2; text-align: start; text-indent:
0px; text-transform: none; white-space: nowrap; widows: 2;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
102); display: inline !important; float: none; ">Vm Web-Frontend01
was shut down due to Hyper01 host reboot or manual fence</span><br>
=20
<span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode
MS', Arial, sans-serif; font-size: small; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing: normal;
line-height: 26px; orphans: 2; text-align: start; text-indent:
0px; text-transform: none; white-space: nowrap; widows: 2;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
102); display: inline !important; float: none; ">All VMs' status
on Non-Responsive Host Hyper01 were changed to 'Down' by
admin@internal</span><br>
=20
<span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode
MS', Arial, sans-serif; font-size: small; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing: normal;
line-height: 26px; orphans: 2; text-align: start; text-indent:
0px; text-transform: none; white-space: nowrap; widows: 2;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
102); display: inline !important; float: none; ">Manual fencing
for host Hyper01 was started.</span><br>
=20
<span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode
MS', Arial, sans-serif; font-size: small; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing: normal;
line-height: 26px; orphans: 2; text-align: start; text-indent:
0px; text-transform: none; white-space: nowrap; widows: 2;
word-spacing: 0px; -webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
102); display: inline !important; float: none; ">VM Web-Frontend01
was restarted on Host Hyper02</span><br>
<br>
<br>
I would like you approach on this problem, reading the documentation
& features pages on the official website, I suppose that this
would have been an automatically mechanism working on some sort of a
vdsm & engine fencing action. Am I missing something regarding
it ?<br>
<br>
<br>
Thank you for your patience reading this.<br>
<br>
<br>
Regards,<br>
Alex.<br>
<br>
<br>
<br>
=20
<br>_______________________________________________<br>Users mailing
list<b=
r><a href=3D"mailto:Users@ovirt.org"
target=3D"_blank">Users(a)ovirt.org</a><=
br>http://lists.ovirt.org/mailman/listinfo/users<br></blockqu...
Alex,<b=
r>Can you share with us the engine's log from the relevant time period?<br>=
<br>Doron<br></div></div></blockquote></blockquote>Hi
Alex,<br>engine log i=
s the important one, as it will indicate on the decision making process.<br=
VDSM logs should be kept in case something is unclear, but I suggest
we be=
gin
with<br>engine.log.<br><br></div></body></html>
------=_Part_3933742_65602238.1358067259763--