Re: [Users] Testing High Availability and Power outages

14 Jan 2013

      This is a multi-part message in MIME format.
--------------050109040900040906090402
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

On 01/14/2013 10:13 AM, Doron Fediuck wrote:
...
------------------------------------------------------------------------
*From: *"Alexandru Vladulescu" <avladulescu@bfproject.ro>
    *To: *"Doron Fediuck" <dfediuck@redhat.com>
    *Cc: *"users" <users@ovirt.org>
    *Sent: *Sunday, January 13, 2013 9:49:25 PM
    *Subject: *Re: [Users] Testing High Availability and Power outages
Dear Doron,
I had the case retested now and I am writing you the results.
Furthermore, if this information should be useful for you, my
    network setup is the following: 2 Layer 2 (Zyxel es2108-g &
    ES2200-8) switches configured with 2 VLANs ( 1 inside backbone
    network -- added br0 to Ovirt ; 1 outside network -- running on
    ovirtmgmt interface for Internet traffic to VMs). The backbone
    switch is a gigabit capable one, and each host runs on jumbo frame
    setup. There is one more firewall server that routes the subnets
    through trunking port and VLAN configuration. The Ovirt software
    has been setup with backbone network subnet.
As you could guess the network infrastructure is not the problem here.
The test case was the same as described before:
1. Vm running on Hyper01, none on Hyper02. Host had configured the
    High Available check box.
    2. Hand power off of Hyper01 from power network (no soft/manual
    shutdown).
    3. After a while, Ovirt marks the Hyper01 as Non Responsive
    4. Manually clicked on Confirm host reboot and the VM starts after
    Ovirt's manual fence to Hyper01 on Hyper02 host.
I have provided engine log attached. The Confirm Host reboot was
    done at precise time of 21:31:45 On the cluster section, in Ovirt,
    I did try changing the "Resilience Policy" attribute from "Migrate
    Virtual Machines" to "Migrate only High Available Virtual
    Machines" but with the same results.
As I am guessing from the engine log the Node Controller sees the
    Hyper01 node as it has a "network fault" no route to host,
    although this was shut down.
Is this supposed to be the default behavior in this case, as the
    scenario might overlap with a real case of network outage.
My Regards,
    Alex.
On 01/13/2013 10:54 AM, Doron Fediuck wrote:
------------------------------------------------------------------------
*From: *"Alexandru Vladulescu" <avladulescu@bfproject.ro>
            *To: *"Doron Fediuck" <dfediuck@redhat.com>
            *Cc: *"users" <users@ovirt.org>
            *Sent: *Sunday, January 13, 2013 10:46:41 AM
            *Subject: *Re: [Users] Testing High Availability and Power
            outages
Dear Doron,
I haven't collected the logs from the tests, but I would
            gladly re-do the case and get back to you asap.
This feature is the main reason of which I have chosen to
            go with Ovirt in the first place, besides other virt
            environments.
Could you please inform me what logs should I be focusing
            on, besides the engine log; vdsm maybe or other relevant logs?
Regards,
            Alex
--
            Sent from phone.
On 13.01.2013, at 09:56, Doron Fediuck
            <dfediuck@redhat.com <mailto:dfediuck@redhat.com>> wrote:
------------------------------------------------------------------------
*From: *"Alexandru Vladulescu"
                    <avladulescu@bfproject.ro
                    <mailto:avladulescu@bfproject.ro>>
                    *To: *"users" <users@ovirt.org
                    <mailto:users@ovirt.org>>
                    *Sent: *Friday, January 11, 2013 2:47:38 PM
                    *Subject: *[Users] Testing High Availability and
                    Power outages
Hi,
Today, I started testing on my Ovirt 3.1
                    installation (from dreyou repos) running on 3 x
                    Centos 6.3 hypervisors the High Availability
                    features and the fence mechanism.
As yesterday, I have reported in a previous email
                    thread, that the migration priority queue cannot
                    be increased (bug) in this current version, I
                    decided to test what the official documentation
                    says about the High Availability cases.
This will be a disaster case scenarios to suffer
                    from if one hypervisor has a power outage/hardware
                    problem and the VMs running on it are not
                    migrating on other spare resources.
In the official documenation from ovirt.org
                    <http://ovirt.org> it is quoted the following:
/High availability /
//
/Allows critical VMs to be restarted on another
                    host in the event of hardware failure with three
                    levels of priority, taking into account resiliency
                    policy. /
//
* /Resiliency policy to control high
                        availability VMs at the cluster level. /
                      * /Supports application-level high availability
                        with supported fencing agents. /
As well as in the Architecture description:
/High Availability - restart guest VMs from failed
                    hosts automatically on other hosts/
So the testing went like this -- One VM running a
                    linux box, having the check box "High Available"
                    and "Priority for Run/Migration queue:" set to
                    Low. On Host we have the check box to "Any Host in
                    Cluster", without "Allow VM migration only upon
                    Admin specific request" checked.
My environment:
Configuration :  2 x Hypervisors (same
                    cluster/hardware configuration) ; 1 x Hypervisor +
                    acting as a NAS (NFS) server (different
                    cluster/hardware configuration)
Actions: Went and cut-off the power from one of
                    the hypervisors from the 2 node clusters, while
                    the VM was running on. This would translate to a
                    power outage.
Results: The hypervisor node that suffered from
                    the outage is showing in Hosts tab as Non
                    Responsive on Status, and the VM has a question
                    mark and cannot be powered off or nothing
                    (therefore it's stuck).
In the Log console in GUI, I get:
Host Hyper01 is non-responsive.
                    VM Web-Frontend01 was set to the Unknown status.
There is nothing I could I could do besides
                    clicking on the Hyper01 "Confirm Host as been
                    rebooted", afterwards the VM starts on the Hyper02
                    with a cold reboot of the VM.
The Log console changes to:
Vm Web-Frontend01 was shut down due to Hyper01
                    host reboot or manual fence
                    All VMs' status on Non-Responsive Host Hyper01
                    were changed to 'Down' by admin@internal
                    Manual fencing for host Hyper01 was started.
                    VM Web-Frontend01 was restarted on Host Hyper02
I would like you approach on this problem, reading
                    the documentation & features pages on the official
                    website, I suppose that this would have been an
                    automatically mechanism working on some sort of a
                    vdsm & engine fencing action. Am I missing
                    something regarding it ?
Thank you for your patience reading this.
Regards,
                    Alex.
_______________________________________________
                    Users mailing list
                    Users@ovirt.org <mailto:Users@ovirt.org>
                    http://lists.ovirt.org/mailman/listinfo/users
Hi Alex,
                Can you share with us the engine's log from the
                relevant time period?
Doron
Hi Alex,
        engine log is the important one, as it will indicate on the
        decision making process.
        VDSM logs should be kept in case something is unclear, but I
        suggest we begin with
        engine.log.
Hi Alex,in tab, rig
In order to have HA working in host level (which is what you're 
testing now) you need to
configure power management to each of the relevant hosts (Go to Hosts 
maht click a host
and choose edit. Now select the Power management tab and you'll see 
it). In the details you
gave us it's not clear how you defined Power management for your 
hosts, so I can only assume
it's not defined properly.
The reason for this necessity is that we cannot resume a VM on a 
different host before we
verified the original hosts status. If, for example the VM is still 
running on the original
host and we lost network connectivity to it, we're in a risk of 
running the same VM on 2 different
hosts at the same time which will corrupt its disk(s). So the only way 
to prevent it, is
rebooting the original host which will ensure the VM is not running 
there. We call the reboot
procedure fencing, and if you'll check your logs you'll be able to see:
2013-01-13 21:29:42,380 ERROR 
[org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] 
(pool-3-thread-44) [a1803d1] Failed to run Fence script on 
vds:Hyper01, VMs moved to UnKnown instead.
So the only way for you to handle it, is to confirm host was rebooted 
(as you did), which will
allow resuming the VM on a different host.
Doron
Hi Doron,

Regarding your reply I don't have such fence mechanism through IMM or 
iLO interface as the hardware that I am using doesn't support such IPMI 
technology. Seeing your response makes me consider the option of really 
getting an add-on card that will be able to do the basic reboot, 
restart, reset functions for our hardware.

Thank you very much for your advice on this.

Alex

--------------050109040900040906090402
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 01/14/2013 10:13 AM, Doron Fediuck
      wrote:<br>
    </div>
    <blockquote
      cite="mid:1145936324.4069952.1358151211713.JavaMail.root@redhat.com"
      type="cite">
      <style type="text/css">p { margin: 0; }</style>
      <div style="font-family: times new roman,new york,times,serif;
        font-size: 12pt; color: #000000"><br>
        <br>
        <hr id="zwchr">
        <blockquote style="border-left:2px solid rgb(16, 16,
255);margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From:
          </b>"Alexandru Vladulescu" <a class="moz-txt-link-rfc2396E" href="mailto:avladulescu@bfproject.ro"><avladulescu@bfproject.ro></a><br>
          <b>To: </b>"Doron Fediuck" <a class="moz-txt-link-rfc2396E" href="mailto:dfediuck@redhat.com"><dfediuck@redhat.com></a><br>
          <b>Cc: </b>"users" <a class="moz-txt-link-rfc2396E" href="mailto:users@ovirt.org"><users@ovirt.org></a><br>
          <b>Sent: </b>Sunday, January 13, 2013 9:49:25 PM<br>
          <b>Subject: </b>Re: [Users] Testing High Availability and
          Power outages<br>
          <br>
          <div class="moz-cite-prefix"><br>
            Dear Doron,<br>
            <br>
            <br>
            I had the case retested now and I am writing you the
            results.<br>
            <br>
            Furthermore, if this information should be useful for you,
            my network setup is the following: 2 Layer 2 (Zyxel es2108-g
            & ES2200-8) switches configured with 2 VLANs ( 1 inside
            backbone network -- added br0 to Ovirt ; 1 outside network
            -- running on ovirtmgmt interface for Internet traffic to
            VMs). The backbone switch is a gigabit capable one, and each
            host runs on jumbo frame setup. There is one more firewall
            server that routes the subnets through trunking port and
            VLAN configuration. The Ovirt software has been setup with
            backbone network subnet.<br>
            <br>
            As you could guess the network infrastructure is not the
            problem here.<br>
            <br>
            The test case was the same as described before:<br>
            <br>
            1. Vm running on Hyper01, none on Hyper02. Host had
            configured the High Available check box.<br>
            2. Hand power off of Hyper01 from power network (no
            soft/manual shutdown).<br>
            3. After a while, Ovirt marks the Hyper01 as Non Responsive<br>
            4. Manually clicked on Confirm host reboot and the VM starts
            after Ovirt's manual fence to Hyper01 on Hyper02 host.<br>
            <br>
            I have provided engine log attached. The Confirm Host reboot
            was done at precise time of 21:31:45 On the cluster section,
            in Ovirt, I did try changing the "Resilience Policy"
            attribute from "Migrate Virtual Machines" to "Migrate only
            High Available Virtual Machines" but with the same results.<br>
            <br>
            <br>
            As I am guessing from the engine log the Node Controller
            sees the Hyper01 node as it has a "network fault" no route
            to host, although this was shut down. <br>
            <br>
            Is this supposed to be the default behavior in this case, as
            the scenario might overlap with a real case of network
            outage.<br>
            <br>
            <br>
            My Regards,<br>
            Alex.<br>
            <br>
            <br>
            <br>
            On 01/13/2013 10:54 AM, Doron Fediuck wrote:<br>
          </div>
          <blockquote
            cite="mid:917703155.3933743.1358067259764.JavaMail.root@redhat.com">
            <style>p { margin: 0; }</style>
            <div style="font-family: times new roman,new
              york,times,serif; font-size: 12pt; color: #000000"><br>
              <br>
              <hr id="zwchr">
              <blockquote style="border-left:2px solid rgb(16, 16,
255);margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From:

                </b>"Alexandru Vladulescu" <a moz-do-not-send="true"
                  class="moz-txt-link-rfc2396E"
                  href="mailto:avladulescu@bfproject.ro" target="_blank"><avladulescu@bfproject.ro></a><br>
                <b>To: </b>"Doron Fediuck" <a moz-do-not-send="true"
                  class="moz-txt-link-rfc2396E"
                  href="mailto:dfediuck@redhat.com" target="_blank"><dfediuck@redhat.com></a><br>
                <b>Cc: </b>"users" <a moz-do-not-send="true"
                  class="moz-txt-link-rfc2396E"
                  href="mailto:users@ovirt.org" target="_blank"><users@ovirt.org></a><br>
                <b>Sent: </b>Sunday, January 13, 2013 10:46:41 AM<br>
                <b>Subject: </b>Re: [Users] Testing High Availability
                and Power outages<br>
                <br>
                <div>Dear Doron,</div>
                <div><br>
                </div>
                <div>I haven't collected the logs from the tests, but I
                  would gladly re-do the case and get back to you asap. </div>
                <div><br>
                </div>
                <div>This feature is the main reason of which I have
                  chosen to go with Ovirt in the first place, besides
                  other virt environments.</div>
                <div><br>
                </div>
                <div>Could you please inform me what logs should I be
                  focusing on, <span class="Apple-style-span"
                    style="-webkit-tap-highlight-color: rgba(26, 26, 26,
                    0.296875); -webkit-composition-fill-color: rgba(175,
                    192, 227, 0.230469);
                    -webkit-composition-frame-color: rgba(77, 128, 180,
                    0.230469); ">besides the engine log; vdsm maybe or
                    other relevant logs?</span></div>
                <div><br>
                  <div>
                    <div>Regards,</div>
                    <div>Alex</div>
                  </div>
                  <div><br>
                  </div>
                  <div><br>
                  </div>
                  <div><span class="Apple-style-span"
                      style="-webkit-tap-highlight-color: rgba(26, 26,
                      26, 0.292969); -webkit-composition-fill-color:
                      rgba(175, 192, 227, 0.230469);
                      -webkit-composition-frame-color: rgba(77, 128,
                      180, 0.230469);">--</span></div>
                  <div><span class="Apple-style-span"
                      style="-webkit-tap-highlight-color: rgba(26, 26,
                      26, 0.296875); -webkit-composition-fill-color:
                      rgba(175, 192, 227, 0.230469);
                      -webkit-composition-frame-color: rgba(77, 128,
                      180, 0.230469); ">Sent from phone.</span></div>
                </div>
                <div><br>
                  On 13.01.2013, at 09:56, Doron Fediuck <<a
                    moz-do-not-send="true"
                    href="mailto:dfediuck@redhat.com" target="_blank">dfediuck@redhat.com</a>>
                  wrote:<br>
                  <br>
                </div>
                <blockquote>
                  <div>
                    <div style="font-family: times new roman,new
                      york,times,serif; font-size: 12pt; color: #000000"><br>
                      <br>
                      <hr id="zwchr">
                      <blockquote style="border-left:2px solid rgb(16,
                        16,
255);margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From:

                        </b>"Alexandru Vladulescu" <<a
                          moz-do-not-send="true"
                          href="mailto:avladulescu@bfproject.ro"
                          target="_blank">avladulescu@bfproject.ro</a>><br>
                        <b>To: </b>"users" <<a
                          moz-do-not-send="true"
                          href="mailto:users@ovirt.org" target="_blank">users@ovirt.org</a>><br>
                        <b>Sent: </b>Friday, January 11, 2013 2:47:38
                        PM<br>
                        <b>Subject: </b>[Users] Testing High
                        Availability and Power outages<br>
                        <br>
                        <br>
                        Hi,<br>
                        <br>
                        <br>
                        Today, I started testing on my Ovirt 3.1
                        installation (from dreyou repos) running on 3 x
                        Centos 6.3 hypervisors the High Availability
                        features and the fence mechanism.<br>
                        <br>
                        As yesterday, I have reported in a previous
                        email thread, that the migration priority queue
                        cannot be increased (bug) in this current
                        version, I decided to test what the official
                        documentation says about the High Availability
                        cases. <br>
                        <br>
                        This will be a disaster case scenarios to suffer
                        from if one hypervisor has a power
                        outage/hardware problem and the VMs running on
                        it are not migrating on other spare resources.<br>
                        <br>
                        <br>
                        In the official documenation from <a
                          moz-do-not-send="true" href="http://ovirt.org"
                          target="_blank">ovirt.org</a> it is quoted the
                        following:<br>
                        <h3> <span class="mw-headline"
                            id="High_availability"> <font
                              color="#333399"><i><small>High
                                  availability </small></i></font></span></h3>
                        <font color="#333399"><i><small> </small></i></font>
                        <p><font color="#333399"><i><small>Allows
                                critical VMs to be restarted on another
                                host in the event of hardware failure
                                with three levels of priority, taking
                                into account resiliency policy. </small></i></font></p>
                        <font color="#333399"><i><small> </small></i></font>
                        <ul>
                          <li><font color="#333399"><i><small>
                                  Resiliency policy to control high
                                  availability VMs at the cluster level.
                                </small></i></font></li>
                          <li><font color="#333399"><i><small> Supports
                                  application-level high availability
                                  with supported fencing agents. </small></i></font></li>
                        </ul>
                        <br>
                        As well as in the Architecture description:<br>
                        <font color="#333399"><br>
                          <small><i>High Availability - restart guest
                              VMs from failed hosts automatically on
                              other hosts</i></small></font><br>
                        <br>
                        <br>
                        <br>
                        So the testing went like this -- One VM running
                        a linux box, having the check box "High
                        Available" and "Priority for Run/Migration
                        queue:" set to Low. On Host we have the check
                        box to "Any Host in Cluster", without "Allow VM
                        migration only upon Admin specific request"
                        checked.<br>
                        <br>
                        <br>
                        <br>
                        My environment:<br>
                        <br>
                        <br>
                        Configuration :  2 x Hypervisors (same
                        cluster/hardware configuration) ; 1 x Hypervisor
                        + acting as a NAS (NFS) server (different
                        cluster/hardware configuration)<br>
                        <br>
                        Actions: Went and cut-off the power from one of
                        the hypervisors from the 2 node clusters, while
                        the VM was running on. This would translate to a
                        power outage.<br>
                        <br>
                        Results: The hypervisor node that suffered from
                        the outage is showing in Hosts tab as Non
                        Responsive on Status, and the VM has a question
                        mark and cannot be powered off or nothing
                        (therefore it's stuck).<br>
                        <br>
                        In the Log console in GUI, I get: <br>
                        <br>
                        <span style="color: rgb(255, 255, 255);
                          font-family: 'Arial Unicode MS', Arial,
                          sans-serif; font-size: small; font-style:
                          normal; font-variant: normal; font-weight:
                          normal; letter-spacing: normal; line-height:
                          26px; orphans: 2; text-align: start;
                          text-indent: 0px; text-transform: none;
                          white-space: nowrap; widows: 2; word-spacing:
                          0px; -webkit-text-size-adjust: auto;
                          -webkit-text-stroke-width: 0px;
                          background-color: rgb(102, 102, 102); display:
                          inline !important; float: none; ">Host Hyper01
                          is non-responsive.</span><br>
                        <span style="color: rgb(255, 255, 255);
                          font-family: 'Arial Unicode MS', Arial,
                          sans-serif; font-size: small; font-style:
                          normal; font-variant: normal; font-weight:
                          normal; letter-spacing: normal; line-height:
                          26px; orphans: 2; text-align: start;
                          text-indent: 0px; text-transform: none;
                          white-space: nowrap; widows: 2; word-spacing:
                          0px; -webkit-text-size-adjust: auto;
                          -webkit-text-stroke-width: 0px;
                          background-color: rgb(102, 102, 102); display:
                          inline !important; float: none; ">VM
                          Web-Frontend01 was set to the Unknown status.</span><br>
                        <br>
                        There is nothing I could I could do besides
                        clicking on the Hyper01 "Confirm Host as been
                        rebooted", afterwards the VM starts on the
                        Hyper02 with a cold reboot of the VM.<br>
                        <br>
                        The Log console changes to:<br>
                        <br>
                        <span style="color: rgb(255, 255, 255);
                          font-family: 'Arial Unicode MS', Arial,
                          sans-serif; font-size: small; font-style:
                          normal; font-variant: normal; font-weight:
                          normal; letter-spacing: normal; line-height:
                          26px; orphans: 2; text-align: start;
                          text-indent: 0px; text-transform: none;
                          white-space: nowrap; widows: 2; word-spacing:
                          0px; -webkit-text-size-adjust: auto;
                          -webkit-text-stroke-width: 0px;
                          background-color: rgb(102, 102, 102); display:
                          inline !important; float: none; ">Vm
                          Web-Frontend01 was shut down due to Hyper01
                          host reboot or manual fence</span><br>
                        <span style="color: rgb(255, 255, 255);
                          font-family: 'Arial Unicode MS', Arial,
                          sans-serif; font-size: small; font-style:
                          normal; font-variant: normal; font-weight:
                          normal; letter-spacing: normal; line-height:
                          26px; orphans: 2; text-align: start;
                          text-indent: 0px; text-transform: none;
                          white-space: nowrap; widows: 2; word-spacing:
                          0px; -webkit-text-size-adjust: auto;
                          -webkit-text-stroke-width: 0px;
                          background-color: rgb(102, 102, 102); display:
                          inline !important; float: none; ">All VMs'
                          status on Non-Responsive Host Hyper01 were
                          changed to 'Down' by admin@internal</span><br>
                        <span style="color: rgb(255, 255, 255);
                          font-family: 'Arial Unicode MS', Arial,
                          sans-serif; font-size: small; font-style:
                          normal; font-variant: normal; font-weight:
                          normal; letter-spacing: normal; line-height:
                          26px; orphans: 2; text-align: start;
                          text-indent: 0px; text-transform: none;
                          white-space: nowrap; widows: 2; word-spacing:
                          0px; -webkit-text-size-adjust: auto;
                          -webkit-text-stroke-width: 0px;
                          background-color: rgb(102, 102, 102); display:
                          inline !important; float: none; ">Manual
                          fencing for host Hyper01 was started.</span><br>
                        <span style="color: rgb(255, 255, 255);
                          font-family: 'Arial Unicode MS', Arial,
                          sans-serif; font-size: small; font-style:
                          normal; font-variant: normal; font-weight:
                          normal; letter-spacing: normal; line-height:
                          26px; orphans: 2; text-align: start;
                          text-indent: 0px; text-transform: none;
                          white-space: nowrap; widows: 2; word-spacing:
                          0px; -webkit-text-size-adjust: auto;
                          -webkit-text-stroke-width: 0px;
                          background-color: rgb(102, 102, 102); display:
                          inline !important; float: none; ">VM
                          Web-Frontend01 was restarted on Host Hyper02</span><br>
                        <br>
                        <br>
                        I would like you approach on this problem,
                        reading the documentation & features pages
                        on the official website, I suppose that this
                        would have been an automatically mechanism
                        working on some sort of a vdsm & engine
                        fencing action. Am I missing something regarding
                        it ?<br>
                        <br>
                        <br>
                        Thank you for your patience reading this.<br>
                        <br>
                        <br>
                        Regards,<br>
                        Alex.<br>
                        <br>
                        <br>
                        <br>
                        <br>
                        _______________________________________________<br>
                        Users mailing list<br>
                        <a moz-do-not-send="true"
                          href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
                        <a moz-do-not-send="true"
                          class="moz-txt-link-freetext"
                          href="http://lists.ovirt.org/mailman/listinfo/users"
                          target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
                      </blockquote>
                      Hi Alex,<br>
                      Can you share with us the engine's log from the
                      relevant time period?<br>
                      <br>
                      Doron<br>
                    </div>
                  </div>
                </blockquote>
              </blockquote>
              Hi Alex,<br>
              engine log is the important one, as it will indicate on
              the decision making process.<br>
              VDSM logs should be kept in case something is unclear, but
              I suggest we begin with<br>
              engine.log.<br>
              <br>
            </div>
          </blockquote>
          <br>
        </blockquote>
        Hi Alex,in tab, rig<br>
        In order to have HA working in host level (which is what you're
        testing now) you need to<br>
        configure power management to each of the relevant hosts (Go to
        Hosts maht click a host<br>
        and choose edit. Now select the Power management tab and you'll
        see it). In the details you<br>
        gave us it's not clear how you defined Power management for your
        hosts, so I can only assume<br>
        it's not defined properly.<br>
        <br>
        The reason for this necessity is that we cannot resume a VM on a
        different host before we<br>
        verified the original hosts status. If, for example the VM is
        still running on the original<br>
        host and we lost network connectivity to it, we're in a risk of
        running the same VM on 2 different<br>
        hosts at the same time which will corrupt its disk(s). So the
        only way to prevent it, is<br>
        rebooting the original host which will ensure the VM is not
        running there. We call the reboot<br>
        procedure fencing, and if you'll check your logs you'll be able
        to see:<br>
        <br>
        2013-01-13 21:29:42,380 ERROR
        [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand]
        (pool-3-thread-44) [a1803d1] Failed to run Fence script on
        vds:Hyper01, VMs moved to UnKnown instead.<br>
        <br>
        So the only way for you to handle it, is to confirm host was
        rebooted (as you did), which will<br>
        allow resuming the VM on a different host.<br>
        <br>
        Doron<br>
      </div>
    </blockquote>
    <br>
    Hi Doron,<br>
    <br>
    Regarding your reply I don't have such fence mechanism through IMM
    or iLO interface as the hardware that I am using doesn't support
    such IPMI technology. Seeing your response makes me consider the
    option of really getting an add-on card that will be able to do the
    basic reboot, restart, reset functions for our hardware.<br>
    <br>
    Thank you very much for your advice on this.<br>
    <br>
    Alex<br>
    <br>
    <br>
  </body>
</html>

--------------050109040900040906090402--