
------=_Part_3933742_65602238.1358067259763 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit ----- Original Message -----
From: "Alexandru Vladulescu" <avladulescu@bfproject.ro> To: "Doron Fediuck" <dfediuck@redhat.com> Cc: "users" <users@ovirt.org> Sent: Sunday, January 13, 2013 10:46:41 AM Subject: Re: [Users] Testing High Availability and Power outages
Dear Doron,
I haven't collected the logs from the tests, but I would gladly re-do the case and get back to you asap.
This feature is the main reason of which I have chosen to go with Ovirt in the first place, besides other virt environments.
Could you please inform me what logs should I be focusing on, besides the engine log; vdsm maybe or other relevant logs?
Regards, Alex
-- Sent from phone.
On 13.01.2013, at 09:56, Doron Fediuck < dfediuck@redhat.com > wrote:
----- Original Message -----
From: "Alexandru Vladulescu" < avladulescu@bfproject.ro >
To: "users" < users@ovirt.org >
Sent: Friday, January 11, 2013 2:47:38 PM
Subject: [Users] Testing High Availability and Power outages
Hi,
Today, I started testing on my Ovirt 3.1 installation (from dreyou repos) running on 3 x Centos 6.3 hypervisors the High Availability features and the fence mechanism.
As yesterday, I have reported in a previous email thread, that the migration priority queue cannot be increased (bug) in this current version, I decided to test what the official documentation says about the High Availability cases.
This will be a disaster case scenarios to suffer from if one hypervisor has a power outage/hardware problem and the VMs running on it are not migrating on other spare resources.
In the official documenation from ovirt.org it is quoted the following:
High availability
Allows critical VMs to be restarted on another host in the event of hardware failure with three levels of priority, taking into account resiliency policy.
* Resiliency policy to control high availability VMs at the cluster level.
* Supports application-level high availability with supported fencing agents.
As well as in the Architecture description:
High Availability - restart guest VMs from failed hosts automatically on other hosts
So the testing went like this -- One VM running a linux box, having the check box "High Available" and "Priority for Run/Migration queue:" set to Low. On Host we have the check box to "Any Host in Cluster", without "Allow VM migration only upon Admin specific request" checked.
My environment:
Configuration : 2 x Hypervisors (same cluster/hardware configuration) ; 1 x Hypervisor + acting as a NAS (NFS) server (different cluster/hardware configuration)
Actions: Went and cut-off the power from one of the hypervisors from the 2 node clusters, while the VM was running on. This would translate to a power outage.
Results: The hypervisor node that suffered from the outage is showing in Hosts tab as Non Responsive on Status, and the VM has a question mark and cannot be powered off or nothing (therefore it's stuck).
In the Log console in GUI, I get:
Host Hyper01 is non-responsive.
VM Web-Frontend01 was set to the Unknown status.
There is nothing I could I could do besides clicking on the Hyper01 "Confirm Host as been rebooted", afterwards the VM starts on the Hyper02 with a cold reboot of the VM.
The Log console changes to:
Vm Web-Frontend01 was shut down due to Hyper01 host reboot or manual fence
All VMs' status on Non-Responsive Host Hyper01 were changed to 'Down' by admin@internal
Manual fencing for host Hyper01 was started.
VM Web-Frontend01 was restarted on Host Hyper02
I would like you approach on this problem, reading the documentation & features pages on the official website, I suppose that this would have been an automatically mechanism working on some sort of a vdsm & engine fencing action. Am I missing something regarding it ?
Thank you for your patience reading this.
Regards,
Alex.
_______________________________________________
Users mailing list
Users@ovirt.org
Hi Alex,
Can you share with us the engine's log from the relevant time period?
Doron
This feature is the main reason of which I have chosen to go with Ovirt in=
Hi Alex, engine log is the important one, as it will indicate on the decision making process. VDSM logs should be kept in case something is unclear, but I suggest we begin with engine.log. ------=_Part_3933742_65602238.1358067259763 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head><style type=3D'text/css'>p { margin: 0; }</style></head><body><= div style=3D'font-family: times new roman,new york,times,serif; font-size: = 12pt; color: #000000'><br><br><hr id=3D"zwchr"><blockquote style=3D"border-= left:2px solid rgb(16, 16, 255);margin-left:5px;padding-left:5px;color:#000= ;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helv= etica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Alexandru Vladulescu"= <avladulescu@bfproject.ro><br><b>To: </b>"Doron Fediuck" <dfediuc= k@redhat.com><br><b>Cc: </b>"users" <users@ovirt.org><br><b>Sent: = </b>Sunday, January 13, 2013 10:46:41 AM<br><b>Subject: </b>Re: [Users] Tes= ting High Availability and Power outages<br><br><div>Dear Doron,</div><div>= <br></div><div>I haven't collected the logs from the tests, but I would gla= dly re-do the case and get back to you asap. </div><div><br></div><div= the first place, besides other virt environments.</div><div><br></div><div=
Could you please inform me what logs should I be focusing on, <span c= lass=3D"Apple-style-span" style=3D"-webkit-tap-highlight-color: rgba(26, 26= , 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.2304= 69); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); ">besid= es the engine log; vdsm maybe or other relevant logs?</span></div><div><br>= <div><div>Regards,</div><div>Alex</div></div><div><br></div><div><br></div>= <div><span class=3D"Apple-style-span" style=3D"-webkit-tap-highlight-color:= rgba(26, 26, 26, 0.292969); -webkit-composition-fill-color: rgba(175, 192,= 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.2304= 69);">--</span></div><div><span class=3D"Apple-style-span" style=3D"-webkit= -tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-=
color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba= (77, 128, 180, 0.230469); ">Sent from phone.</span></div></div><div><br>On = 13.01.2013, at 09:56, Doron Fediuck <<a href=3D"mailto:dfediuck@redhat.c= om" target=3D"_blank">dfediuck@redhat.com</a>> wrote:<br><br></div><div>= </div><blockquote><div><div style=3D"font-family: times new roman,new york,= times,serif; font-size: 12pt; color: #000000"><br><br><hr id=3D"zwchr"><blo= ckquote style=3D"border-left:2px solid rgb(16, 16, 255);margin-left:5px;pad= ding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decorati= on:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </= b>"Alexandru Vladulescu" <<a href=3D"mailto:avladulescu@bfproject.ro" ta= rget=3D"_blank">avladulescu@bfproject.ro</a>><br><b>To: </b>"users" <= <a href=3D"mailto:users@ovirt.org" target=3D"_blank">users@ovirt.org</a>>= ;<br><b>Sent: </b>Friday, January 11, 2013 2:47:38 PM<br><b>Subject: </b>[U= sers] Testing High Availability and Power outages<br><br> =20
=20 =20 =20 <br> Hi,<br> <br> <br> Today, I started testing on my Ovirt 3.1 installation (from dreyou repos) running on 3 x Centos 6.3 hypervisors the High Availability features and the fence mechanism.<br> <br> As yesterday, I have reported in a previous email thread, that the migration priority queue cannot be increased (bug) in this current version, I decided to test what the official documentation says about the High Availability cases. <br> <br> This will be a disaster case scenarios to suffer from if one hypervisor has a power outage/hardware problem and the VMs running on it are not migrating on other spare resources.<br> <br> <br> In the official documenation from <a href=3D"http://ovirt.org" target= =3D"_blank">ovirt.org</a> it is quoted the following:<br> <h3> <span class=3D"mw-headline" id=3D"High_availability"> <font color= =3D"#333399"><i><small>High availability </small></i></font></span></h3> <font color=3D"#333399"><i><small> </small></i></font> <p><font color=3D"#333399"><i><small>Allows critical VMs to be restarted on another host in the event of hardware failure with three levels of priority, taking into account resiliency policy. </small></i></font></p> <font color=3D"#333399"><i><small> </small></i></font> <ul> <li><font color=3D"#333399"><i><small> Resiliency policy to control high availability VMs at the cluster level. </small></i></font></li> <li><font color=3D"#333399"><i><small> Supports application-level high availability with supported fencing agents. </small></i></font></li> </ul> <br> As well as in the Architecture description:<br> <font color=3D"#333399"><br> <small><i>High Availability - restart guest VMs from failed hosts automatically on other hosts</i></small></font><br> <br> <br> <br> So the testing went like this -- One VM running a linux box, having the check box "High Available" and "Priority for Run/Migration queue:" set to Low. On Host we have the check box to "Any Host in Cluster", without "Allow VM migration only upon Admin specific request" checked.<br> <br> <br> <br> My environment:<br> <br> <br> Configuration : 2 x Hypervisors (same cluster/hardware configuration) ; 1 x Hypervisor + acting as a NAS (NFS) server (different cluster/hardware configuration)<br> <br> Actions: Went and cut-off the power from one of the hypervisors from the 2 node clusters, while the VM was running on. This would translate to a power outage.<br> <br> Results: The hypervisor node that suffered from the outage is showing in Hosts tab as Non Responsive on Status, and the VM has a question mark and cannot be powered off or nothing (therefore it's stuck).<br> <br> In the Log console in GUI, I get: <br> <br> =20 =20 <span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">Host Hyper01 is non-responsive.</span><br> =20 <span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">VM Web-Frontend01 was set to the Unknown status.</span><br> =20 <br> There is nothing I could I could do besides clicking on the Hyper01 "Confirm Host as been rebooted", afterwards the VM starts on the Hyper02 with a cold reboot of the VM.<br> <br> The Log console changes to:<br> <br> =20 <span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">Vm Web-Frontend01 was shut down due to Hyper01 host reboot or manual fence</span><br> =20 <span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">All VMs' status on Non-Responsive Host Hyper01 were changed to 'Down' by admin@internal</span><br> =20 <span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">Manual fencing for host Hyper01 was started.</span><br> =20 <span style=3D"color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">VM Web-Frontend01 was restarted on Host Hyper02</span><br> <br> <br> I would like you approach on this problem, reading the documentation & features pages on the official website, I suppose that this would have been an automatically mechanism working on some sort of a vdsm & engine fencing action. Am I missing something regarding it ?<br> <br> <br> Thank you for your patience reading this.<br> <br> <br> Regards,<br> Alex.<br> <br> <br> <br> =20 <br>_______________________________________________<br>Users mailing list<b= r><a href=3D"mailto:Users@ovirt.org" target=3D"_blank">Users@ovirt.org</a><= br>http://lists.ovirt.org/mailman/listinfo/users<br></blockquote>Hi Alex,<b= r>Can you share with us the engine's log from the relevant time period?<br>= <br>Doron<br></div></div></blockquote></blockquote>Hi Alex,<br>engine log i= s the important one, as it will indicate on the decision making process.<br=
VDSM logs should be kept in case something is unclear, but I suggest we be= gin with<br>engine.log.<br><br></div></body></html> ------=_Part_3933742_65602238.1358067259763--