<div dir="ltr">Thanks !<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jan 28, 2014 at 2:04 PM, Eli Mesika <span dir="ltr"><<a href="mailto:emesika@redhat.com" target="_blank">emesika@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><br>
<br>
----- Original Message -----<br>
> From: "Jaison peter" <<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a>><br>
> To: "Eli Mesika" <<a href="mailto:emesika@redhat.com">emesika@redhat.com</a>><br>
> Cc: <a href="mailto:users@ovirt.org">users@ovirt.org</a>, "Tareq Alayan" <<a href="mailto:talayan@redhat.com">talayan@redhat.com</a>><br>
> Sent: Tuesday, January 28, 2014 7:33:35 AM<br>
> Subject: Re: [Users] two node ovirt cluster with HA<br>
><br>
</div><div class="im">> Thank you all for your valuable feedback .<br>
><br>
> Can you please specify some of the supported fencing devices in ovirt ?<br>
<br>
</div>For oVirt 3.4 :<br>
<br>
apc,apc_snmp,bladecenter,cisco_ucs,drac5,drac7,eps,hpblade,ilo,ilo2,ilo3,ilo4,ipmilan,rsa,rsb,wti<br>
<div class="HOEnZb"><div class="h5"><br>
><br>
><br>
> On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika <<a href="mailto:emesika@redhat.com">emesika@redhat.com</a>> wrote:<br>
><br>
> ><br>
> ><br>
> > ----- Original Message -----<br>
> > > From: "Tareq Alayan" <<a href="mailto:talayan@redhat.com">talayan@redhat.com</a>><br>
> > > To: "Andrew Lau" <<a href="mailto:andrew@andrewklau.com">andrew@andrewklau.com</a>>, "Eli Mesika" <<br>
> > <a href="mailto:emesika@redhat.com">emesika@redhat.com</a>><br>
> > > Cc: <a href="mailto:dron@redhat.com">dron@redhat.com</a>, "Karli Sjöberg" <<a href="mailto:Karli.Sjoberg@slu.se">Karli.Sjoberg@slu.se</a>>,<br>
> > <a href="mailto:users@ovirt.org">users@ovirt.org</a><br>
> > > Sent: Monday, January 27, 2014 2:59:02 PM<br>
> > > Subject: Re: [Users] two node ovirt cluster with HA<br>
> > ><br>
> > > Adding Eli.<br>
> ><br>
> > I just want to summarize the requirement as I understand it:<br>
> ><br>
> > In the case that a Host that is running HA VMs and have PM configured is<br>
> > turned off manually :<br>
> ><br>
> > 1) The non-responsive treatment should be modified to check Host status<br>
> > via PM agent<br>
> > 2) If Host is off , HA VMs will attempt to run on another host ASAP<br>
> > 3) The host status should be set to DOWN<br>
> > 4) No attempt to restart vdsm (soft fencing) or restart the host (hard<br>
> > fencing) will be done<br>
> ><br>
> > Is the above correct? if so , a RFE on that can be opened<br>
> ><br>
> > ><br>
> > ><br>
> > > On 01/27/2014 02:50 PM, Andrew Lau wrote:<br>
> > > > Hi,<br>
> > > ><br>
> > > > I think he was asking what if the power management device reported<br>
> > > > that the host was powered off. Then VMs should be brought back up as<br>
> > > > being off would essentially be the same as running a power<br>
> > cycle/reboot?<br>
> > > ><br>
> > > > Another example I'm seeing is what happens if the whole host loses<br>
> > > > power and it's power management device then becomes unavailable (ie.<br>
> > > > not reachable) then you're stuck in the case where it requires manual<br>
> > > > intervention.<br>
> > > ><br>
> > > > I would be interested to potentially see something like a timeout on<br>
> > > > those problematic VMs (eg. if nothing was read or write after x amount<br>
> > > > of time) then you could consider the host as offline? I guess then<br>
> > > > that adds a lot of risk..<br>
> > > ><br>
> > > ><br>
> > > > On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <<a href="mailto:talayan@redhat.com">talayan@redhat.com</a><br>
> > > > <mailto:<a href="mailto:talayan@redhat.com">talayan@redhat.com</a>>> wrote:<br>
> > > ><br>
> > > > Hi,<br>
> > > ><br>
> > > > Power management makes use of special *dedicated* hardware in<br>
> > > > order to restart hosts independently of host OS. The engine<br>
> > > > connects to a power management devices using a *dedicated* network<br>
> > > > IP address.<br>
> > > > The engine is capable of rebooting hosts that have entered a<br>
> > > > non-operational or non-responsive state,<br>
> > > > The abilities provided by all power management devices are: check<br>
> > > > status, start, stop and recycle (restart)...<br>
> > > ><br>
> > > > In the case of non-responsive host: all of the VMs that are<br>
> > > > currently running on that host can also become non-responsive.<br>
> > > > However, the non-responsive host keeps locking the VM hard disk<br>
> > > > for all VMs it is running. Attempting to start a VM on a different<br>
> > > > host and assign the second host write privileges for the virtual<br>
> > > > machine hard disk image can cause data corruption.<br>
> > > > Rebooting allows the engine to assume that the lock on a VM hard<br>
> > > > disk image has been released.<br>
> > > > The engine can know for sure that the problematic host has been<br>
> > > > rebooted via the power management device and then it can start a<br>
> > > > VM from the problematic host on another host without risking data<br>
> > > > corruption.<br>
> > > > Important note: A virtual machine that has been marked<br>
> > > > highly-available can not be safely started on a different host<br>
> > > > without the certainty that doing so will not cause data corruption.<br>
> > > ><br>
> > > > N-joy,<br>
> > > ><br>
> > > > --Tareq<br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > On 01/27/2014 02:05 PM, Dafna Ron wrote:<br>
> > > ><br>
> > > > I am adding Tareq for the Power Management implementation.<br>
> > > ><br>
> > > > Dafna<br>
> > > ><br>
> > > ><br>
> > > > On 01/27/2014 11:48 AM, Karli Sjöberg wrote:<br>
> > > ><br>
> > > > On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:<br>
> > > ><br>
> > > > Powering off the host will never trigger vm migration.<br>
> > > > As far as engine is concerned it just lost connection<br>
> > > > to the host, but<br>
> > > > has no way of telling if the host is down or if a<br>
> > > > router is down.<br>
> > > ><br>
> > > > Can´t it at least check with power management if the Host<br>
> > > > status is down<br>
> > > > first?<br>
> > > ><br>
> > > > I mean, if the network is down there will be no response<br>
> > > > from either PM<br>
> > > > or Host. But if PM is up and can tell you that the Host is<br>
> > > > down, sounds<br>
> > > > rather clear cut to me...<br>
> > > ><br>
> > > > Seems to me the VM's would be restarted sooner if the flow<br>
> > > > was altered<br>
> > > > to first check with PM if it´s a network or Host issue,<br>
> > > > and if Host<br>
> > > > issue, immediately restart VM's on another Host, instead<br>
> > > > of waiting for<br>
> > > > a potentially problematic Host to boot up eventually.<br>
> > > ><br>
> > > > /K<br>
> > > ><br>
> > > > since vm's can continue running on the host even if<br>
> > > > engine has no access<br>
> > > > to it, starting the vm's on the second host can cause<br>
> > > > split brain and<br>
> > > > data corruption.<br>
> > > ><br>
> > > > The way that the engine knows what's going on is by<br>
> > > > sending heath check<br>
> > > > queries to the vdsm.<br>
> > > > Power management will try to reboot a host when the<br>
> > > > health checks to<br>
> > > > vdsm will not be answered.<br>
> > > > So... if engine gets no reply and has no way of<br>
> > > > rebooting the host, the<br>
> > > > host status will be changed to Non-Responsive and the<br>
> > > > vm's will be<br>
> > > > unknown because engine has no way of knowing what's<br>
> > > > happening with the<br>
> > > > vm's.<br>
> > > > Since reboot of the host will kill the vm's running on<br>
> > > > it - this will<br>
> > > > never cause any vm migration but... along with the<br>
> > > > High-Availability vm<br>
> > > > feature, you will be able to have some of the vm's<br>
> > > > re-started on the<br>
> > > > second host after the host reboot (and that is only if<br>
> > > > Power Management<br>
> > > > was confirmed as successful).<br>
> > > ><br>
> > > > VM migration is only triggered when:<br>
> > > > 1. Cluster configuration states that the vm should be<br>
> > > > migrated in case<br>
> > > > of failure<br>
> > > > 2. Engine has access to the host - so the failure is<br>
> > > > on the storage side<br>
> > > > and not the host side.<br>
> > > > 3. the vms are not actively writing (although there<br>
> > > > might be a new RFE<br>
> > > > for it).<br>
> > > ><br>
> > > > hope this clears things up<br>
> > > ><br>
> > > > Dafna<br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > On 01/27/2014 10:11 AM, Andrew Lau wrote:<br>
> > > ><br>
> > > > Hi,<br>
> > > ><br>
> > > > Have you got power management enabled?<br>
> > > ><br>
> > > > That's the fencing feature required for the engine<br>
> > > > to ensure that the<br>
> > > > host is actually offline. It won't resume any<br>
> > > > other VMs to prevent<br>
> > > > potential VM corruption (eg. VM running on<br>
> > > > multiple hosts).<br>
> > > ><br>
> > > > Andrew.<br>
> > > ><br>
> > > > On Jan 27, 2014 5:12 PM, "Jaison peter"<br>
> > > > <<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a> <mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a>><br>
> > > > <mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a><br>
> > > > <mailto:<a href="mailto:urotrip2@gmail.com">urotrip2@gmail.com</a>>>> wrote:<br>
> > > ><br>
> > > > Hi all ,<br>
> > > ><br>
> > > > I was setting a two node ovirt cluster with<br>
> > > > ovirt engine on<br>
> > > > seperate node . I completed the configuration<br>
> > > > and tested VM live<br>
> > > > migrations with out any issues . Then for<br>
> > > > checking cluster HA I<br>
> > > > powered down one host and expected vms<br>
> > > > running on that host to be<br>
> > > > migrated to the other one . But nothing<br>
> > > > happened , Engine detected<br>
> > > > host as un-rechable and marked it as<br>
> > > > non-operational and vm ran on<br>
> > > > that host went to 'unknown state' . Is that<br>
> > > > not possible to setup<br>
> > > > a fully HA ovirt cluster with two nodes ? or<br>
> > > > else is that my<br>
> > > > configuration problem ? please advice .<br>
> > > ><br>
> > > > Thanks & Regards<br>
> > > ><br>
> > > > Alex<br>
> > > ><br>
> > > ><br>
> > _______________________________________________<br>
> > > > Users mailing list<br>
> > > > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>><br>
> > > > <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>>><br>
> > > > <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > > _______________________________________________<br>
> > > > Users mailing list<br>
> > > > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>><br>
> > > > <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> > > ><br>
> > > ><br>
> > > > --<br>
> > > > Dafna Ron<br>
> > > > _______________________________________________<br>
> > > > Users mailing list<br>
> > > > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a> <mailto:<a href="mailto:Users@ovirt.org">Users@ovirt.org</a>><br>
> > > > <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > > ><br>
> > ><br>
> > ><br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
> > <a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> ><br>
><br>
</div></div></blockquote></div><br></div>