<div dir="ltr"><div>Thank you all for your valuable feedback . <br><br></div>Can you please specify some of the supported fencing devices in ovirt ?<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika <span dir="ltr">&lt;<a href="mailto:emesika@redhat.com" target="_blank">emesika@redhat.com</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
----- Original Message -----<br>
&gt; From: &quot;Tareq Alayan&quot; &lt;<a href="mailto:talayan@redhat.com">talayan@redhat.com</a>&gt;<br>
&gt; To: &quot;Andrew Lau&quot; &lt;<a href="mailto:andrew@andrewklau.com">andrew@andrewklau.com</a>&gt;, &quot;Eli Mesika&quot; &lt;<a href="mailto:emesika@redhat.com">emesika@redhat.com</a>&gt;<br>
&gt; Cc: <a href="mailto:dron@redhat.com">dron@redhat.com</a>, &quot;Karli Sjöberg&quot; &lt;<a href="mailto:Karli.Sjoberg@slu.se">Karli.Sjoberg@slu.se</a>&gt;, <a href="mailto:users@ovirt.org">users@ovirt.org</a><br>
&gt; Sent: Monday, January 27, 2014 2:59:02 PM<br>
&gt; Subject: Re: [Users] two node ovirt cluster with HA<br>
&gt;<br>
&gt; Adding Eli.<br>
<br>
I just want to summarize the requirement as I understand it:<br>
<br>
In the case that a Host that is running HA VMs and have PM configured is turned off manually :<br>
<br>
1) The non-responsive treatment should be modified to check Host status via PM agent<br>
2) If Host is off , HA VMs will attempt to run on another host ASAP<br>
3) The host status should be set to DOWN<br>
4) No attempt to restart vdsm (soft fencing) or restart the host (hard fencing) will be done<br>
<br>
Is the above correct? if so , a RFE on that can be opened<br>
<div class="im"><br>
&gt;<br>
&gt;<br>
&gt; On 01/27/2014 02:50 PM, Andrew Lau wrote:<br>
</div><div class="im">&gt; &gt; Hi,<br>
&gt; &gt;<br>
&gt; &gt; I think he was asking what if the power management device reported<br>
&gt; &gt; that the host was powered off. Then VMs should be brought back up as<br>
&gt; &gt; being off would essentially be the same as running a power cycle/reboot?<br>
&gt; &gt;<br>
&gt; &gt; Another example I&#39;m seeing is what happens if the whole host loses<br>
&gt; &gt; power and it&#39;s power management device then becomes unavailable (ie.<br>
&gt; &gt; not reachable) then you&#39;re stuck in the case where it requires manual<br>
&gt; &gt; intervention.<br>
&gt; &gt;<br>
&gt; &gt; I would be interested to potentially see something like a timeout on<br>
&gt; &gt; those problematic VMs (eg. if nothing was read or write after x amount<br>
&gt; &gt; of time) then you could consider the host as offline? I guess then<br>
&gt; &gt; that adds a lot of risk..<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan &lt;<a href="mailto:talayan@redhat.com">talayan@redhat.com</a><br>
</div><div><div class="h5">&gt; &gt; &lt;mailto:<a href="mailto:talayan@redhat.com">talayan@redhat.com</a>&gt;&gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt;     Hi,<br>
&gt; &gt;<br>
&gt; &gt;     Power management makes use of special *dedicated* hardware in<br>
&gt; &gt;     order to restart hosts independently of host OS. The engine<br>
&gt; &gt;     connects to a power management devices using a *dedicated* network<br>
&gt; &gt;     IP address.<br>
&gt; &gt;     The engine is capable of rebooting hosts that have entered a<br>
&gt; &gt;     non-operational or non-responsive state,<br>
&gt; &gt;     The abilities provided by all power management devices are: check<br>
&gt; &gt;     status, start, stop and recycle (restart)...<br>
&gt; &gt;<br>
&gt; &gt;     In the case of non-responsive host: all of the VMs that are<br>
&gt; &gt;     currently running on that host can also become non-responsive.<br>
&gt; &gt;     However, the non-responsive host keeps locking the VM hard disk<br>
&gt; &gt;     for all VMs it is running. Attempting to start a VM on a different<br>
&gt; &gt;     host and assign the second host write privileges for the virtual<br>
&gt; &gt;     machine hard disk image can cause data corruption.<br>
&gt; &gt;     Rebooting allows the engine to assume that the lock on a VM hard<br>
&gt; &gt;     disk image has been released.<br>
&gt; &gt;     The engine can know for sure that the problematic host has been<br>
&gt; &gt;     rebooted via the power management device and then it can start a<br>
&gt; &gt;     VM from the problematic host on another host without risking data<br>
&gt; &gt;     corruption.<br>
&gt; &gt;     Important note: A virtual machine that has been marked<br>
&gt; &gt;     highly-available can not be safely started on a different host<br>
&gt; &gt;     without the certainty that doing so will not cause data corruption.<br>
&gt; &gt;<br>
&gt; &gt;     N-joy,<br>
&gt; &gt;<br>
&gt; &gt;     --Tareq<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;     On 01/27/2014 02:05 PM, Dafna Ron wrote:<br>
&gt; &gt;<br>
&gt; &gt;         I am adding Tareq for the Power Management implementation.<br>
&gt; &gt;<br>
&gt; &gt;         Dafna<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;         On 01/27/2014 11:48 AM, Karli Sjöberg wrote:<br>
&gt; &gt;<br>
&gt; &gt;             On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:<br>
&gt; &gt;<br>
&gt; &gt;                 Powering off the host will never trigger vm migration.<br>
&gt; &gt;                 As far as engine is concerned it just lost connection<br>
&gt; &gt;                 to the host, but<br>
&gt; &gt;                 has no way of telling if the host is down or if a<br>
&gt; &gt;                 router is down.<br>
&gt; &gt;<br>
&gt; &gt;             Can´t it at least check with power management if the Host<br>
&gt; &gt;             status is down<br>
&gt; &gt;             first?<br>
&gt; &gt;<br>
&gt; &gt;             I mean, if the network is down there will be no response<br>
&gt; &gt;             from either PM<br>
&gt; &gt;             or Host. But if PM is up and can tell you that the Host is<br>
&gt; &gt;             down, sounds<br>
&gt; &gt;             rather clear cut to me...<br>
&gt; &gt;<br>
&gt; &gt;             Seems to me the VM&#39;s would be restarted sooner if the flow<br>
&gt; &gt;             was altered<br>
&gt; &gt;             to first check with PM if it´s a network or Host issue,<br>
&gt; &gt;             and if Host<br>
&gt; &gt;             issue, immediately restart VM&#39;s on another Host, instead<br>
&gt; &gt;             of waiting for<br>
&gt; &gt;             a potentially problematic Host to boot up eventually.<br>
&gt; &gt;<br>
&gt; &gt;             /K<br>
&gt; &gt;<br>
&gt; &gt;                 since vm&#39;s can continue running on the host even if<br>
&gt; &gt;                 engine has no access<br>
&gt; &gt;                 to it, starting the vm&#39;s on the second host can cause<br>
&gt; &gt;                 split brain and<br>
&gt; &gt;                 data corruption.<br>
&gt; &gt;<br>
&gt; &gt;                 The way that the engine knows what&#39;s going on is by<br>
&gt; &gt;                 sending heath check<br>
&gt; &gt;                 queries to the vdsm.<br>
&gt; &gt;                 Power management will try to reboot a host when the<br>
&gt; &gt;                 health checks to<br>
&gt; &gt;                 vdsm will not be answered.<br>
&gt; &gt;                 So... if engine gets no reply and has no way of<br>
&gt; &gt;                 rebooting the host, the<br>
&gt; &gt;                 host status will be changed to Non-Responsive and the<br>
&gt; &gt;                 vm&#39;s will be<br>
&gt; &gt;                 unknown because engine has no way of knowing what&#39;s<br>
&gt; &gt;                 happening with the<br>
&gt; &gt;                 vm&#39;s.<br>
&gt; &gt;                 Since reboot of the host will kill the vm&#39;s running on<br>
&gt; &gt;                 it - this will<br>
&gt; &gt;                 never cause any vm migration but... along with the<br>
&gt; &gt;                 High-Availability vm<br>
&gt; &gt;                 feature, you will be able to have some of the vm&#39;s<br>
&gt; &gt;                 re-started on the<br>
&gt; &gt;                 second host after the host reboot (and that is only if<br>
&gt; &gt;                 Power Management<br>
&gt; &gt;                 was confirmed as successful).<br>
&gt; &gt;<br>
&gt; &gt;                 VM migration is only triggered when:<br>
&gt; &gt;                 1. Cluster configuration states that the vm should be<br>
&gt; &gt;                 migrated in case<br>
&gt; &gt;                 of failure<br>
&gt; &gt;                 2. Engine has access to the host - so the failure is<br>
&gt; &gt;                 on the storage side<br>
&gt; &gt;                 and not the host side.<br>
&gt; &gt;                 3. the vms are not actively writing (although there<br>
&gt; &gt;                 might be a new RFE<br>
&gt; &gt;                 for it).<br>
&gt; &gt;<br>
&gt; &gt;                 hope this clears things up<br>
&gt; &gt;<br>
&gt; &gt;                 Dafna<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;                 On 01/27/2014 10:11 AM, Andrew Lau wrote:<br>
&gt; &gt;<br>
&gt; &gt;                     Hi,<br>
&gt; &gt;<br>
&gt; &gt;                     Have you got power management enabled?<br>
&gt; &gt;<br>
&gt; &gt;                     That&#39;s the fencing feature required for the engine<br>
&gt; &gt;                     to ensure that the<br>
&gt; &gt;                     host is actually offline. It won&#39;t resume any<br>
&gt; &gt;                     other VMs to prevent<br>
&gt; &gt;                     potential VM corruption (eg. VM running on<br>
&gt; &gt;                     multiple hosts).<br>
&gt; &gt;<br>
&gt; &gt;                     Andrew.<br>
&gt; &gt;<br>
&gt; &gt;                     On Jan 27, 2014 5:12 PM, &quot;Jaison peter&quot;<br>
&gt; &gt;                     &lt;<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a> &lt;mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a>&gt;<br>
</div></div>&gt; &gt;                     &lt;mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a><br>
<div class="im">&gt; &gt;                     &lt;mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a>&gt;&gt;&gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt;                          Hi all ,<br>
&gt; &gt;<br>
&gt; &gt;                          I was setting a two node ovirt cluster with<br>
&gt; &gt;                     ovirt engine on<br>
&gt; &gt;                          seperate node . I completed the configuration<br>
&gt; &gt;                     and tested VM  live<br>
&gt; &gt;                          migrations with out any issues . Then for<br>
&gt; &gt;                     checking cluster HA I<br>
&gt; &gt;                          powered down one host and expected vms<br>
&gt; &gt;                     running on that host to be<br>
&gt; &gt;                          migrated to the other one . But nothing<br>
&gt; &gt;                     happened , Engine detected<br>
&gt; &gt;                          host as un-rechable and marked it as<br>
&gt; &gt;                     non-operational and vm ran on<br>
&gt; &gt;                          that host went to &#39;unknown state&#39; . Is that<br>
&gt; &gt;                     not possible to setup<br>
&gt; &gt;                          a fully HA ovirt cluster with two nodes ? or<br>
&gt; &gt;                     else is that my<br>
&gt; &gt;                          configuration problem ? please advice .<br>
&gt; &gt;<br>
&gt; &gt;                          Thanks &amp; Regards<br>
&gt; &gt;<br>
&gt; &gt;                          Alex<br>
&gt; &gt;<br>
&gt; &gt;                          _______________________________________________<br>
&gt; &gt;                          Users mailing list<br>
&gt; &gt;                     <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> &lt;mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>&gt;<br>
</div>&gt; &gt;                     &lt;mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a> &lt;mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>&gt;&gt;<br>
<div class="im">&gt; &gt;                     <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;                     _______________________________________________<br>
&gt; &gt;                     Users mailing list<br>
</div>&gt; &gt;                     <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> &lt;mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>&gt;<br>
<div class="HOEnZb"><div class="h5">&gt; &gt;                     <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;                 --<br>
&gt; &gt;                 Dafna Ron<br>
&gt; &gt;                 _______________________________________________<br>
&gt; &gt;                 Users mailing list<br>
&gt; &gt;                 <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> &lt;mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>&gt;<br>
&gt; &gt;                 <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt;<br>
&gt;<br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
</div></div></blockquote></div><br></div>