<div dir="ltr"><div>Thank you all for your valuable feedback . <br><br></div>Can you please specify some of the supported fencing devices in ovirt ?<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika <span dir="ltr"><<a href="mailto:emesika@redhat.com" target="_blank">emesika@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
----- Original Message -----<br>
> From: "Tareq Alayan" <<a href="mailto:talayan@redhat.com">talayan@redhat.com</a>><br>
> To: "Andrew Lau" <<a href="mailto:andrew@andrewklau.com">andrew@andrewklau.com</a>>, "Eli Mesika" <<a href="mailto:emesika@redhat.com">emesika@redhat.com</a>><br>
> Cc: <a href="mailto:dron@redhat.com">dron@redhat.com</a>, "Karli Sjöberg" <<a href="mailto:Karli.Sjoberg@slu.se">Karli.Sjoberg@slu.se</a>>, <a href="mailto:users@ovirt.org">users@ovirt.org</a><br>
> Sent: Monday, January 27, 2014 2:59:02 PM<br>
> Subject: Re: [Users] two node ovirt cluster with HA<br>
><br>
> Adding Eli.<br>
<br>
I just want to summarize the requirement as I understand it:<br>
<br>
In the case that a Host that is running HA VMs and have PM configured is turned off manually :<br>
<br>
1) The non-responsive treatment should be modified to check Host status via PM agent<br>
2) If Host is off , HA VMs will attempt to run on another host ASAP<br>
3) The host status should be set to DOWN<br>
4) No attempt to restart vdsm (soft fencing) or restart the host (hard fencing) will be done<br>
<br>
Is the above correct? if so , a RFE on that can be opened<br>
<div class="im"><br>
><br>
><br>
> On 01/27/2014 02:50 PM, Andrew Lau wrote:<br>
</div><div class="im">> > Hi,<br>
> ><br>
> > I think he was asking what if the power management device reported<br>
> > that the host was powered off. Then VMs should be brought back up as<br>
> > being off would essentially be the same as running a power cycle/reboot?<br>
> ><br>
> > Another example I'm seeing is what happens if the whole host loses<br>
> > power and it's power management device then becomes unavailable (ie.<br>
> > not reachable) then you're stuck in the case where it requires manual<br>
> > intervention.<br>
> ><br>
> > I would be interested to potentially see something like a timeout on<br>
> > those problematic VMs (eg. if nothing was read or write after x amount<br>
> > of time) then you could consider the host as offline? I guess then<br>
> > that adds a lot of risk..<br>
> ><br>
> ><br>
> > On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <<a href="mailto:talayan@redhat.com">talayan@redhat.com</a><br>
</div><div><div class="h5">> > <mailto:<a href="mailto:talayan@redhat.com">talayan@redhat.com</a>>> wrote:<br>
> ><br>
> > Hi,<br>
> ><br>
> > Power management makes use of special *dedicated* hardware in<br>
> > order to restart hosts independently of host OS. The engine<br>
> > connects to a power management devices using a *dedicated* network<br>
> > IP address.<br>
> > The engine is capable of rebooting hosts that have entered a<br>
> > non-operational or non-responsive state,<br>
> > The abilities provided by all power management devices are: check<br>
> > status, start, stop and recycle (restart)...<br>
> ><br>
> > In the case of non-responsive host: all of the VMs that are<br>
> > currently running on that host can also become non-responsive.<br>
> > However, the non-responsive host keeps locking the VM hard disk<br>
> > for all VMs it is running. Attempting to start a VM on a different<br>
> > host and assign the second host write privileges for the virtual<br>
> > machine hard disk image can cause data corruption.<br>
> > Rebooting allows the engine to assume that the lock on a VM hard<br>
> > disk image has been released.<br>
> > The engine can know for sure that the problematic host has been<br>
> > rebooted via the power management device and then it can start a<br>
> > VM from the problematic host on another host without risking data<br>
> > corruption.<br>
> > Important note: A virtual machine that has been marked<br>
> > highly-available can not be safely started on a different host<br>
> > without the certainty that doing so will not cause data corruption.<br>
> ><br>
> > N-joy,<br>
> ><br>
> > --Tareq<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > On 01/27/2014 02:05 PM, Dafna Ron wrote:<br>
> ><br>
> > I am adding Tareq for the Power Management implementation.<br>
> ><br>
> > Dafna<br>
> ><br>
> ><br>
> > On 01/27/2014 11:48 AM, Karli Sjöberg wrote:<br>
> ><br>
> > On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:<br>
> ><br>
> > Powering off the host will never trigger vm migration.<br>
> > As far as engine is concerned it just lost connection<br>
> > to the host, but<br>
> > has no way of telling if the host is down or if a<br>
> > router is down.<br>
> ><br>
> > Can´t it at least check with power management if the Host<br>
> > status is down<br>
> > first?<br>
> ><br>
> > I mean, if the network is down there will be no response<br>
> > from either PM<br>
> > or Host. But if PM is up and can tell you that the Host is<br>
> > down, sounds<br>
> > rather clear cut to me...<br>
> ><br>
> > Seems to me the VM's would be restarted sooner if the flow<br>
> > was altered<br>
> > to first check with PM if it´s a network or Host issue,<br>
> > and if Host<br>
> > issue, immediately restart VM's on another Host, instead<br>
> > of waiting for<br>
> > a potentially problematic Host to boot up eventually.<br>
> ><br>
> > /K<br>
> ><br>
> > since vm's can continue running on the host even if<br>
> > engine has no access<br>
> > to it, starting the vm's on the second host can cause<br>
> > split brain and<br>
> > data corruption.<br>
> ><br>
> > The way that the engine knows what's going on is by<br>
> > sending heath check<br>
> > queries to the vdsm.<br>
> > Power management will try to reboot a host when the<br>
> > health checks to<br>
> > vdsm will not be answered.<br>
> > So... if engine gets no reply and has no way of<br>
> > rebooting the host, the<br>
> > host status will be changed to Non-Responsive and the<br>
> > vm's will be<br>
> > unknown because engine has no way of knowing what's<br>
> > happening with the<br>
> > vm's.<br>
> > Since reboot of the host will kill the vm's running on<br>
> > it - this will<br>
> > never cause any vm migration but... along with the<br>
> > High-Availability vm<br>
> > feature, you will be able to have some of the vm's<br>
> > re-started on the<br>
> > second host after the host reboot (and that is only if<br>
> > Power Management<br>
> > was confirmed as successful).<br>
> ><br>
> > VM migration is only triggered when:<br>
> > 1. Cluster configuration states that the vm should be<br>
> > migrated in case<br>
> > of failure<br>
> > 2. Engine has access to the host - so the failure is<br>
> > on the storage side<br>
> > and not the host side.<br>
> > 3. the vms are not actively writing (although there<br>
> > might be a new RFE<br>
> > for it).<br>
> ><br>
> > hope this clears things up<br>
> ><br>
> > Dafna<br>
> ><br>
> ><br>
> ><br>
> > On 01/27/2014 10:11 AM, Andrew Lau wrote:<br>
> ><br>
> > Hi,<br>
> ><br>
> > Have you got power management enabled?<br>
> ><br>
> > That's the fencing feature required for the engine<br>
> > to ensure that the<br>
> > host is actually offline. It won't resume any<br>
> > other VMs to prevent<br>
> > potential VM corruption (eg. VM running on<br>
> > multiple hosts).<br>
> ><br>
> > Andrew.<br>
> ><br>
> > On Jan 27, 2014 5:12 PM, "Jaison peter"<br>
> > <<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a> <mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a>><br>
</div></div>> > <mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a><br>
<div class="im">> > <mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a>>>> wrote:<br>
> ><br>
> > Hi all ,<br>
> ><br>
> > I was setting a two node ovirt cluster with<br>
> > ovirt engine on<br>
> > seperate node . I completed the configuration<br>
> > and tested VM live<br>
> > migrations with out any issues . Then for<br>
> > checking cluster HA I<br>
> > powered down one host and expected vms<br>
> > running on that host to be<br>
> > migrated to the other one . But nothing<br>
> > happened , Engine detected<br>
> > host as un-rechable and marked it as<br>
> > non-operational and vm ran on<br>
> > that host went to 'unknown state' . Is that<br>
> > not possible to setup<br>
> > a fully HA ovirt cluster with two nodes ? or<br>
> > else is that my<br>
> > configuration problem ? please advice .<br>
> ><br>
> > Thanks & Regards<br>
> ><br>
> > Alex<br>
> ><br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>><br>
</div>> > <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>>><br>
<div class="im">> > <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > Users mailing list<br>
</div>> > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>><br>
<div class="HOEnZb"><div class="h5">> > <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> ><br>
> ><br>
> > --<br>
> > Dafna Ron<br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>><br>
> > <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
><br>
><br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
</div></div></blockquote></div><br></div>