[Users] two node ovirt cluster with HA
Eli Mesika
emesika at redhat.com
Mon Jan 27 15:40:21 UTC 2014
----- Original Message -----
> From: "Tareq Alayan" <talayan at redhat.com>
> To: "Andrew Lau" <andrew at andrewklau.com>, "Eli Mesika" <emesika at redhat.com>
> Cc: dron at redhat.com, "Karli Sjöberg" <Karli.Sjoberg at slu.se>, users at ovirt.org
> Sent: Monday, January 27, 2014 2:59:02 PM
> Subject: Re: [Users] two node ovirt cluster with HA
>
> Adding Eli.
I just want to summarize the requirement as I understand it:
In the case that a Host that is running HA VMs and have PM configured is turned off manually :
1) The non-responsive treatment should be modified to check Host status via PM agent
2) If Host is off , HA VMs will attempt to run on another host ASAP
3) The host status should be set to DOWN
4) No attempt to restart vdsm (soft fencing) or restart the host (hard fencing) will be done
Is the above correct? if so , a RFE on that can be opened
>
>
> On 01/27/2014 02:50 PM, Andrew Lau wrote:
> > Hi,
> >
> > I think he was asking what if the power management device reported
> > that the host was powered off. Then VMs should be brought back up as
> > being off would essentially be the same as running a power cycle/reboot?
> >
> > Another example I'm seeing is what happens if the whole host loses
> > power and it's power management device then becomes unavailable (ie.
> > not reachable) then you're stuck in the case where it requires manual
> > intervention.
> >
> > I would be interested to potentially see something like a timeout on
> > those problematic VMs (eg. if nothing was read or write after x amount
> > of time) then you could consider the host as offline? I guess then
> > that adds a lot of risk..
> >
> >
> > On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <talayan at redhat.com
> > <mailto:talayan at redhat.com>> wrote:
> >
> > Hi,
> >
> > Power management makes use of special *dedicated* hardware in
> > order to restart hosts independently of host OS. The engine
> > connects to a power management devices using a *dedicated* network
> > IP address.
> > The engine is capable of rebooting hosts that have entered a
> > non-operational or non-responsive state,
> > The abilities provided by all power management devices are: check
> > status, start, stop and recycle (restart)...
> >
> > In the case of non-responsive host: all of the VMs that are
> > currently running on that host can also become non-responsive.
> > However, the non-responsive host keeps locking the VM hard disk
> > for all VMs it is running. Attempting to start a VM on a different
> > host and assign the second host write privileges for the virtual
> > machine hard disk image can cause data corruption.
> > Rebooting allows the engine to assume that the lock on a VM hard
> > disk image has been released.
> > The engine can know for sure that the problematic host has been
> > rebooted via the power management device and then it can start a
> > VM from the problematic host on another host without risking data
> > corruption.
> > Important note: A virtual machine that has been marked
> > highly-available can not be safely started on a different host
> > without the certainty that doing so will not cause data corruption.
> >
> > N-joy,
> >
> > --Tareq
> >
> >
> >
> >
> > On 01/27/2014 02:05 PM, Dafna Ron wrote:
> >
> > I am adding Tareq for the Power Management implementation.
> >
> > Dafna
> >
> >
> > On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
> >
> > On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
> >
> > Powering off the host will never trigger vm migration.
> > As far as engine is concerned it just lost connection
> > to the host, but
> > has no way of telling if the host is down or if a
> > router is down.
> >
> > Can´t it at least check with power management if the Host
> > status is down
> > first?
> >
> > I mean, if the network is down there will be no response
> > from either PM
> > or Host. But if PM is up and can tell you that the Host is
> > down, sounds
> > rather clear cut to me...
> >
> > Seems to me the VM's would be restarted sooner if the flow
> > was altered
> > to first check with PM if it´s a network or Host issue,
> > and if Host
> > issue, immediately restart VM's on another Host, instead
> > of waiting for
> > a potentially problematic Host to boot up eventually.
> >
> > /K
> >
> > since vm's can continue running on the host even if
> > engine has no access
> > to it, starting the vm's on the second host can cause
> > split brain and
> > data corruption.
> >
> > The way that the engine knows what's going on is by
> > sending heath check
> > queries to the vdsm.
> > Power management will try to reboot a host when the
> > health checks to
> > vdsm will not be answered.
> > So... if engine gets no reply and has no way of
> > rebooting the host, the
> > host status will be changed to Non-Responsive and the
> > vm's will be
> > unknown because engine has no way of knowing what's
> > happening with the
> > vm's.
> > Since reboot of the host will kill the vm's running on
> > it - this will
> > never cause any vm migration but... along with the
> > High-Availability vm
> > feature, you will be able to have some of the vm's
> > re-started on the
> > second host after the host reboot (and that is only if
> > Power Management
> > was confirmed as successful).
> >
> > VM migration is only triggered when:
> > 1. Cluster configuration states that the vm should be
> > migrated in case
> > of failure
> > 2. Engine has access to the host - so the failure is
> > on the storage side
> > and not the host side.
> > 3. the vms are not actively writing (although there
> > might be a new RFE
> > for it).
> >
> > hope this clears things up
> >
> > Dafna
> >
> >
> >
> > On 01/27/2014 10:11 AM, Andrew Lau wrote:
> >
> > Hi,
> >
> > Have you got power management enabled?
> >
> > That's the fencing feature required for the engine
> > to ensure that the
> > host is actually offline. It won't resume any
> > other VMs to prevent
> > potential VM corruption (eg. VM running on
> > multiple hosts).
> >
> > Andrew.
> >
> > On Jan 27, 2014 5:12 PM, "Jaison peter"
> > <urotrip2 at gmail.com <mailto:urotrip2 at gmail.com>
> > <mailto:urotrip2 at gmail.com
> > <mailto:urotrip2 at gmail.com>>> wrote:
> >
> > Hi all ,
> >
> > I was setting a two node ovirt cluster with
> > ovirt engine on
> > seperate node . I completed the configuration
> > and tested VM live
> > migrations with out any issues . Then for
> > checking cluster HA I
> > powered down one host and expected vms
> > running on that host to be
> > migrated to the other one . But nothing
> > happened , Engine detected
> > host as un-rechable and marked it as
> > non-operational and vm ran on
> > that host went to 'unknown state' . Is that
> > not possible to setup
> > a fully HA ovirt cluster with two nodes ? or
> > else is that my
> > configuration problem ? please advice .
> >
> > Thanks & Regards
> >
> > Alex
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org <mailto:Users at ovirt.org>
> > <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
> > http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org <mailto:Users at ovirt.org>
> > http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> > --
> > Dafna Ron
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org <mailto:Users at ovirt.org>
> > http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> >
> >
> >
> >
> >
>
>
More information about the Users
mailing list