Re: [Users] Testing High Availability and Power outages

13 Jan 2013


      ------=_Part_3929039_1222204772.1358063817083
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

----- Original Message -----
...
From: "Alexandru Vladulescu" <avladulescu@bfproject.ro>
To: "users" <users@ovirt.org>
Sent: Friday, January 11, 2013 2:47:38 PM
Subject: [Users] Testing High Availability and Power outages
...
Hi,
...
Today, I started testing on my Ovirt 3.1 installation (from dreyou
repos) running on 3 x Centos 6.3 hypervisors the High Availability
features and the fence mechanism.
...
As yesterday, I have reported in a previous email thread, that the
migration priority queue cannot be increased (bug) in this current
version, I decided to test what the official documentation says
about the High Availability cases.
...
This will be a disaster case scenarios to suffer from if one
hypervisor has a power outage/hardware problem and the VMs running
on it are not migrating on other spare resources.
...
In the official documenation from ovirt.org it is quoted the
following:
High availability
...
Allows critical VMs to be restarted on another host in the event of
hardware failure with three levels of priority, taking into account
resiliency policy.
...
* Resiliency policy to control high availability VMs at the cluster
level.
* Supports application-level high availability with supported fencing
agents.
...
As well as in the Architecture description:
...
High Availability - restart guest VMs from failed hosts automatically
on other hosts
...
So the testing went like this -- One VM running a linux box, having
the check box "High Available" and "Priority for Run/Migration
queue:" set to Low. On Host we have the check box to "Any Host in
Cluster", without "Allow VM migration only upon Admin specific
request" checked.
...
My environment:
...
Configuration : 2 x Hypervisors (same cluster/hardware configuration)
; 1 x Hypervisor + acting as a NAS (NFS) server (different
cluster/hardware configuration)
...
Actions: Went and cut-off the power from one of the hypervisors from
the 2 node clusters, while the VM was running on. This would
translate to a power outage.
...
Results: The hypervisor node that suffered from the outage is showing
in Hosts tab as Non Responsive on Status, and the VM has a question
mark and cannot be powered off or nothing (therefore it's stuck).
...
In the Log console in GUI, I get:
...
Host Hyper01 is non-responsive.
VM Web-Frontend01 was set to the Unknown status.
...
There is nothing I could I could do besides clicking on the Hyper01
"Confirm Host as been rebooted", afterwards the VM starts on the
Hyper02 with a cold reboot of the VM.
...
The Log console changes to:
...
Vm Web-Frontend01 was shut down due to Hyper01 host reboot or manual
fence
All VMs' status on Non-Responsive Host Hyper01 were changed to 'Down'
by admin@internal
Manual fencing for host Hyper01 was started.
VM Web-Frontend01 was restarted on Host Hyper02
...
I would like you approach on this problem, reading the documentation
& features pages on the official website, I suppose that this would
have been an automatically mechanism working on some sort of a vdsm
& engine fencing action. Am I missing something regarding it ?
...
Thank you for your patience reading this.
...
Regards,
Alex.
...
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
Hi Alex, 
Can you share with us the engine's log from the relevant time period? 

Doron 

------=_Part_3929039_1222204772.1358063817083
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit

<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: times new roman,new york,times,serif; font-size: 12pt; color: #000000'><br><br><hr id="zwchr"><blockquote style="border-left:2px solid rgb(16, 16, 255);margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Alexandru Vladulescu" <avladulescu@bfproject.ro><br><b>To: </b>"users" <users@ovirt.org><br><b>Sent: </b>Friday, January 11, 2013 2:47:38 PM<br><b>Subject: </b>[Users] Testing High Availability and Power outages<br><br>
  

    
  
  
    <br>
    Hi,<br>
    <br>
    <br>
    Today, I started testing on my Ovirt 3.1 installation (from dreyou
    repos) running on 3 x Centos 6.3 hypervisors the High Availability
    features and the fence mechanism.<br>
    <br>
    As yesterday, I have reported in a previous email thread, that the
    migration priority queue cannot be increased (bug) in this current
    version, I decided to test what the official documentation says
    about the High Availability cases. <br>
    <br>
    This will be a disaster case scenarios to suffer from if one
    hypervisor has a power outage/hardware problem and the VMs running
    on it are not migrating on other spare resources.<br>
    <br>
    <br>
    In the official documenation from ovirt.org it is quoted the
    following:<br>
    <h3> <span class="mw-headline" id="High_availability"> <font color="#333399"><i><small>High availability </small></i></font></span></h3>
    <font color="#333399"><i><small>
        </small></i></font>
    <p><font color="#333399"><i><small>Allows critical VMs to be
            restarted on another host in the event of hardware failure
            with three levels of priority, taking into account
            resiliency policy.
          </small></i></font></p>
    <font color="#333399"><i><small>
        </small></i></font>
    <ul>
      <li><font color="#333399"><i><small> Resiliency policy to control
              high availability VMs at the cluster level.
            </small></i></font></li>
      <li><font color="#333399"><i><small> Supports application-level
              high availability with supported fencing agents.
            </small></i></font></li>
    </ul>
    <br>
    As well as in the Architecture description:<br>
    <font color="#333399"><br>
      <small><i>High Availability - restart guest VMs from failed hosts
          automatically on other hosts</i></small></font><br>
    <br>
    <br>
    <br>
    So the testing went like this -- One VM running a linux box, having
    the check box "High Available" and "Priority for Run/Migration
    queue:" set to Low. On Host we have the check box to "Any Host in
    Cluster", without "Allow VM migration only upon Admin specific
    request" checked.<br>
    <br>
    <br>
    <br>
    My environment:<br>
    <br>
    <br>
    Configuration :  2 x Hypervisors (same cluster/hardware
    configuration) ; 1 x Hypervisor + acting as a NAS (NFS) server
    (different cluster/hardware configuration)<br>
    <br>
    Actions: Went and cut-off the power from one of the hypervisors from
    the 2 node clusters, while the VM was running on. This would
    translate to a power outage.<br>
    <br>
    Results: The hypervisor node that suffered from the outage is
    showing in Hosts tab as Non Responsive on Status, and the VM has a
    question mark and cannot be powered off or nothing (therefore it's
    stuck).<br>
    <br>
    In the Log console in GUI, I get: <br>
    <br>
    
    
    <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode
      MS', Arial, sans-serif; font-size: small; font-style: normal;
      font-variant: normal; font-weight: normal; letter-spacing: normal;
      line-height: 26px; orphans: 2; text-align: start; text-indent:
      0px; text-transform: none; white-space: nowrap; widows: 2;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
      102); display: inline !important; float: none; ">Host Hyper01 is
      non-responsive.</span><br>
    
    <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode
      MS', Arial, sans-serif; font-size: small; font-style: normal;
      font-variant: normal; font-weight: normal; letter-spacing: normal;
      line-height: 26px; orphans: 2; text-align: start; text-indent:
      0px; text-transform: none; white-space: nowrap; widows: 2;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
      102); display: inline !important; float: none; ">VM Web-Frontend01
      was set to the Unknown status.</span><br>
    
    <br>
    There is nothing I could I could do besides clicking on the Hyper01
    "Confirm Host as been rebooted", afterwards the VM starts on the
    Hyper02 with a cold reboot of the VM.<br>
    <br>
    The Log console changes to:<br>
    <br>
    
    <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode
      MS', Arial, sans-serif; font-size: small; font-style: normal;
      font-variant: normal; font-weight: normal; letter-spacing: normal;
      line-height: 26px; orphans: 2; text-align: start; text-indent:
      0px; text-transform: none; white-space: nowrap; widows: 2;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
      102); display: inline !important; float: none; ">Vm Web-Frontend01
      was shut down due to Hyper01 host reboot or manual fence</span><br>
    
    <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode
      MS', Arial, sans-serif; font-size: small; font-style: normal;
      font-variant: normal; font-weight: normal; letter-spacing: normal;
      line-height: 26px; orphans: 2; text-align: start; text-indent:
      0px; text-transform: none; white-space: nowrap; widows: 2;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
      102); display: inline !important; float: none; ">All VMs' status
      on Non-Responsive Host Hyper01 were changed to 'Down' by
      admin@internal</span><br>
    
    <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode
      MS', Arial, sans-serif; font-size: small; font-style: normal;
      font-variant: normal; font-weight: normal; letter-spacing: normal;
      line-height: 26px; orphans: 2; text-align: start; text-indent:
      0px; text-transform: none; white-space: nowrap; widows: 2;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
      102); display: inline !important; float: none; ">Manual fencing
      for host Hyper01 was started.</span><br>
    
    <span style="color: rgb(255, 255, 255); font-family: 'Arial Unicode
      MS', Arial, sans-serif; font-size: small; font-style: normal;
      font-variant: normal; font-weight: normal; letter-spacing: normal;
      line-height: 26px; orphans: 2; text-align: start; text-indent:
      0px; text-transform: none; white-space: nowrap; widows: 2;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; background-color: rgb(102, 102,
      102); display: inline !important; float: none; ">VM Web-Frontend01
      was restarted on Host Hyper02</span><br>
    <br>
    <br>
    I would like you approach on this problem, reading the documentation
    & features pages on the official website, I suppose that this
    would have been an automatically mechanism working on some sort of a
    vdsm & engine fencing action. Am I missing something regarding
    it ?<br>
    <br>
    <br>
    Thank you for your patience reading this.<br>
    <br>
    <br>
    Regards,<br>
    Alex.<br>
    <br>
    <br>
    <br>
  

<br>_______________________________________________<br>Users mailing list<br>Users@ovirt.org<br>http://lists.ovirt.org/mailman/listinfo/users<br></blockquote>Hi Alex,<br>Can you share with us the engine's log from the relevant time period?<br><br>Doron<br></div></body></html>
------=_Part_3929039_1222204772.1358063817083--

Re: [Users] Testing High Availability and Power outages

Doron Fediuck