
This is a multi-part message in MIME format. --------------070301050308080004000702 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, Today, I started testing on my Ovirt 3.1 installation (from dreyou repos) running on 3 x Centos 6.3 hypervisors the High Availability features and the fence mechanism. As yesterday, I have reported in a previous email thread, that the migration priority queue cannot be increased (bug) in this current version, I decided to test what the official documentation says about the High Availability cases. This will be a disaster case scenarios to suffer from if one hypervisor has a power outage/hardware problem and the VMs running on it are not migrating on other spare resources. In the official documenation from ovirt.org it is quoted the following: /High availability / // /Allows critical VMs to be restarted on another host in the event of hardware failure with three levels of priority, taking into account resiliency policy. / // * /Resiliency policy to control high availability VMs at the cluster level. / * /Supports application-level high availability with supported fencing agents. / As well as in the Architecture description: /High Availability - restart guest VMs from failed hosts automatically on other hosts/ So the testing went like this -- One VM running a linux box, having the check box "High Available" and "Priority for Run/Migration queue:" set to Low. On Host we have the check box to "Any Host in Cluster", without "Allow VM migration only upon Admin specific request" checked. My environment: Configuration : 2 x Hypervisors (same cluster/hardware configuration) ; 1 x Hypervisor + acting as a NAS (NFS) server (different cluster/hardware configuration) Actions: Went and cut-off the power from one of the hypervisors from the 2 node clusters, while the VM was running on. This would translate to a power outage. Results: The hypervisor node that suffered from the outage is showing in Hosts tab as Non Responsive on Status, and the VM has a question mark and cannot be powered off or nothing (therefore it's stuck). In the Log console in GUI, I get: Host Hyper01 is non-responsive. VM Web-Frontend01 was set to the Unknown status. There is nothing I could I could do besides clicking on the Hyper01 "Confirm Host as been rebooted", afterwards the VM starts on the Hyper02 with a cold reboot of the VM. The Log console changes to: Vm Web-Frontend01 was shut down due to Hyper01 host reboot or manual fence All VMs' status on Non-Responsive Host Hyper01 were changed to 'Down' by admin@internal Manual fencing for host Hyper01 was started. VM Web-Frontend01 was restarted on Host Hyper02 I would like you approach on this problem, reading the documentation & features pages on the official website, I suppose that this would have been an automatically mechanism working on some sort of a vdsm & engine fencing action. Am I missing something regarding it ? Thank you for your patience reading this. Regards, Alex. --------------070301050308080004000702 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> </head> <body text="#000000" bgcolor="#FFFFFF"> <br> Hi,<br> <br> <br> Today, I started testing on my Ovirt 3.1 installation (from dreyou repos) running on 3 x Centos 6.3 hypervisors the High Availability features and the fence mechanism.<br> <br> As yesterday, I have reported in a previous email thread, that the migration priority queue cannot be increased (bug) in this current version, I decided to test what the official documentation says about the High Availability cases. <br> <br> This will be a disaster case scenarios to suffer from if one hypervisor has a power outage/hardware problem and the VMs running on it are not migrating on other spare resources.<br> <br> <br> In the official documenation from ovirt.org it is quoted the following:<br> <h3> <span class="mw-headline" id="High_availability"> <font color="#333399"><i><small>High availability </small></i></font></span></h3> <font color="#333399"><i><small> </small></i></font> <p><font color="#333399"><i><small>Allows critical VMs to be restarted on another host in the event of hardware failure with three levels of priority, taking into account resiliency policy. </small></i></font></p> <font color="#333399"><i><small> </small></i></font> <ul> <li><font color="#333399"><i><small> Resiliency policy to control high availability VMs at the cluster level. </small></i></font></li> <li><font color="#333399"><i><small> Supports application-level high availability with supported fencing agents. </small></i></font></li> </ul> <br> As well as in the Architecture description:<br> <font color="#333399"><br> <small><i>High Availability - restart guest VMs from failed hosts automatically on other hosts</i></small></font><br> <br> <br> <br> So the testing went like this -- One VM running a linux box, having the check box "High Available" and "Priority for Run/Migration queue:" set to Low. On Host we have the check box to "Any Host in Cluster", without "Allow VM migration only upon Admin specific request" checked.<br> <br> <br> <br> My environment:<br> <br> <br> Configuration : 2 x Hypervisors (same cluster/hardware configuration) ; 1 x Hypervisor + acting as a NAS (NFS) server (different cluster/hardware configuration)<br> <br> Actions: Went and cut-off the power from one of the hypervisors from the 2 node clusters, while the VM was running on. This would translate to a power outage.<br> <br> Results: The hypervisor node that suffered from the outage is showing in Hosts tab as Non Responsive on Status, and the VM has a question mark and cannot be powered off or nothing (therefore it's stuck).<br> <br> In the Log console in GUI, I get: <br> <br> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">Host Hyper01 is non-responsive.</span><br> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">VM Web-Frontend01 was set to the Unknown status.</span><br> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <br> There is nothing I could I could do besides clicking on the Hyper01 "Confirm Host as been rebooted", afterwards the VM starts on the Hyper02 with a cold reboot of the VM.<br> <br> The Log console changes to:<br> <br> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">Vm Web-Frontend01 was shut down due to Hyper01 host reboot or manual fence</span><br> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">All VMs' status on Non-Responsive Host Hyper01 were changed to 'Down' by admin@internal</span><br> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">Manual fencing for host Hyper01 was started.</span><br> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode MS', Arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 26px; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: nowrap; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: rgb(102, 102, 102); display: inline !important; float: none; ">VM Web-Frontend01 was restarted on Host Hyper02</span><br> <br> <br> I would like you approach on this problem, reading the documentation & features pages on the official website, I suppose that this would have been an automatically mechanism working on some sort of a vdsm & engine fencing action. Am I missing something regarding it ?<br> <br> <br> Thank you for your patience reading this.<br> <br> <br> Regards,<br> Alex.<br> <br> <br> <br> </body> </html> --------------070301050308080004000702--