[Users] two node ovirt cluster with HA

Karli Sjöberg Karli.Sjoberg at slu.se
Mon Jan 27 12:54:31 UTC 2014


On Mon, 2014-01-27 at 14:43 +0200, Tareq Alayan wrote:
> Hi,
> 
> Power management makes use of special *dedicated* hardware in order to 
> restart hosts independently of host OS. The engine connects to a power 
> management devices using a *dedicated* network IP address.
> The engine is capable of rebooting hosts that have entered a 
> non-operational or non-responsive state,
> The abilities provided by all power management devices are: check 
> status, start, stop and recycle (restart)...
> 
> In the case of non-responsive host: all of the VMs that are currently 
> running on that host can also become non-responsive. However, the 
> non-responsive host keeps locking the VM hard disk for all VMs it is 
> running. Attempting to start a VM on a different host and assign the 
> second host write privileges for the virtual machine hard disk image can 
> cause data corruption.

Exactly! If Engine was to first check towards the power management that
the problematic/non-responsove Host indeed is down, there would be no
risk of data corruption.

That´s why I suggested a change in the HA flow to first check if the
Host is indeed down, if so, just start the VM's on another Host.

/K

> Rebooting allows the engine to assume that the lock on a VM hard disk 
> image has been released.
> The engine can know for sure that the problematic host has been rebooted 
> via the power management device and then it can start a VM from the 
> problematic host on another host without risking data corruption.
> Important note: A virtual machine that has been marked highly-available 
> can not be safely started on a different host without the certainty that 
> doing so will not cause data corruption.
> 
> N-joy,
> 
> --Tareq
> 
> 
> 
> On 01/27/2014 02:05 PM, Dafna Ron wrote:
> > I am adding Tareq for the Power Management implementation.
> >
> > Dafna
> >
> >
> > On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
> >> On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
> >>> Powering off the host will never trigger vm migration.
> >>> As far as engine is concerned it just lost connection to the host, but
> >>> has no way of telling if the host is down or if a router is down.
> >> Can´t it at least check with power management if the Host status is down
> >> first?
> >>
> >> I mean, if the network is down there will be no response from either PM
> >> or Host. But if PM is up and can tell you that the Host is down, sounds
> >> rather clear cut to me...
> >>
> >> Seems to me the VM's would be restarted sooner if the flow was altered
> >> to first check with PM if it´s a network or Host issue, and if Host
> >> issue, immediately restart VM's on another Host, instead of waiting for
> >> a potentially problematic Host to boot up eventually.
> >>
> >> /K
> >>
> >>> since vm's can continue running on the host even if engine has no 
> >>> access
> >>> to it, starting the vm's on the second host can cause split brain and
> >>> data corruption.
> >>>
> >>> The way that the engine knows what's going on is by sending heath check
> >>> queries to the vdsm.
> >>> Power management will try to reboot a host when the health checks to
> >>> vdsm will not be answered.
> >>> So... if engine gets no reply and has no way of rebooting the host, the
> >>> host status will be changed to Non-Responsive and the vm's will be
> >>> unknown because engine has no way of knowing what's happening with the
> >>> vm's.
> >>> Since reboot of the host will kill the vm's running on it - this will
> >>> never cause any vm migration but... along with the High-Availability vm
> >>> feature, you will be able to have some of the vm's re-started on the
> >>> second host after the host reboot (and that is only if Power Management
> >>> was confirmed as successful).
> >>>
> >>> VM migration is only triggered when:
> >>> 1. Cluster configuration states that the vm should be migrated in case
> >>> of failure
> >>> 2. Engine has access to the host - so the failure is on the storage 
> >>> side
> >>> and not the host side.
> >>> 3. the vms are not actively writing (although there might be a new RFE
> >>> for it).
> >>>
> >>> hope this clears things up
> >>>
> >>> Dafna
> >>>
> >>>
> >>>
> >>> On 01/27/2014 10:11 AM, Andrew Lau wrote:
> >>>> Hi,
> >>>>
> >>>> Have you got power management enabled?
> >>>>
> >>>> That's the fencing feature required for the engine to ensure that the
> >>>> host is actually offline. It won't resume any other VMs to prevent
> >>>> potential VM corruption (eg. VM running on multiple hosts).
> >>>>
> >>>> Andrew.
> >>>>
> >>>> On Jan 27, 2014 5:12 PM, "Jaison peter" <urotrip2 at gmail.com
> >>>> <mailto:urotrip2 at gmail.com>> wrote:
> >>>>
> >>>>      Hi all ,
> >>>>
> >>>>      I was setting a two node ovirt cluster with ovirt engine on
> >>>>      seperate node . I completed the configuration and tested VM  live
> >>>>      migrations with out any issues . Then for checking cluster HA I
> >>>>      powered down one host and expected vms running on that host to be
> >>>>      migrated to the other one . But nothing happened , Engine 
> >>>> detected
> >>>>      host as un-rechable and marked it as non-operational and vm 
> >>>> ran on
> >>>>      that host went to 'unknown state' . Is that not possible to setup
> >>>>      a fully HA ovirt cluster with two nodes ? or else is that my
> >>>>      configuration problem ? please advice .
> >>>>
> >>>>      Thanks & Regards
> >>>>
> >>>>      Alex
> >>>>
> >>>>      _______________________________________________
> >>>>      Users mailing list
> >>>>      Users at ovirt.org <mailto:Users at ovirt.org>
> >>>>      http://lists.ovirt.org/mailman/listinfo/users
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> Users at ovirt.org
> >>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>
> >>> -- 
> >>> Dafna Ron
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users at ovirt.org
> >>> http://lists.ovirt.org/mailman/listinfo/users
> >>
> >>
> >
> >
> 



-- 

Med Vänliga Hälsningar

-------------------------------------------------------------------------------
Karli Sjöberg
Swedish University of Agricultural Sciences Box 7079 (Visiting Address
Kronåsvägen 8)
S-750 07 Uppsala, Sweden
Phone:  +46-(0)18-67 15 66
karli.sjoberg at slu.se


More information about the Users mailing list