[Users] two node ovirt cluster with HA

Tue Jan 28 05:33:35 UTC 2014

Thank you all for your valuable feedback .

Can you please specify some of the supported fencing devices in ovirt ?

On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika <emesika at redhat.com> wrote:

>
>
> ----- Original Message -----
> > From: "Tareq Alayan" <talayan at redhat.com>
> > To: "Andrew Lau" <andrew at andrewklau.com>, "Eli Mesika" <
> emesika at redhat.com>
> > Cc: dron at redhat.com, "Karli Sjöberg" <Karli.Sjoberg at slu.se>,
> users at ovirt.org
> > Sent: Monday, January 27, 2014 2:59:02 PM
> > Subject: Re: [Users] two node ovirt cluster with HA
> >
> > Adding Eli.
>
> I just want to summarize the requirement as I understand it:
>
> In the case that a Host that is running HA VMs and have PM configured is
> turned off manually :
>
> 1) The non-responsive treatment should be modified to check Host status
> via PM agent
> 2) If Host is off , HA VMs will attempt to run on another host ASAP
> 3) The host status should be set to DOWN
> 4) No attempt to restart vdsm (soft fencing) or restart the host (hard
> fencing) will be done
>
> Is the above correct? if so , a RFE on that can be opened
>
> >
> >
> > On 01/27/2014 02:50 PM, Andrew Lau wrote:
> > > Hi,
> > >
> > > I think he was asking what if the power management device reported
> > > that the host was powered off. Then VMs should be brought back up as
> > > being off would essentially be the same as running a power
> cycle/reboot?
> > >
> > > Another example I'm seeing is what happens if the whole host loses
> > > power and it's power management device then becomes unavailable (ie.
> > > not reachable) then you're stuck in the case where it requires manual
> > > intervention.
> > >
> > > I would be interested to potentially see something like a timeout on
> > > those problematic VMs (eg. if nothing was read or write after x amount
> > > of time) then you could consider the host as offline? I guess then
> > > that adds a lot of risk..
> > >
> > >
> > > On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <talayan at redhat.com
> > > <mailto:talayan at redhat.com>> wrote:
> > >
> > >     Hi,
> > >
> > >     Power management makes use of special *dedicated* hardware in
> > >     order to restart hosts independently of host OS. The engine
> > >     connects to a power management devices using a *dedicated* network
> > >     IP address.
> > >     The engine is capable of rebooting hosts that have entered a
> > >     non-operational or non-responsive state,
> > >     The abilities provided by all power management devices are: check
> > >     status, start, stop and recycle (restart)...
> > >
> > >     In the case of non-responsive host: all of the VMs that are
> > >     currently running on that host can also become non-responsive.
> > >     However, the non-responsive host keeps locking the VM hard disk
> > >     for all VMs it is running. Attempting to start a VM on a different
> > >     host and assign the second host write privileges for the virtual
> > >     machine hard disk image can cause data corruption.
> > >     Rebooting allows the engine to assume that the lock on a VM hard
> > >     disk image has been released.
> > >     The engine can know for sure that the problematic host has been
> > >     rebooted via the power management device and then it can start a
> > >     VM from the problematic host on another host without risking data
> > >     corruption.
> > >     Important note: A virtual machine that has been marked
> > >     highly-available can not be safely started on a different host
> > >     without the certainty that doing so will not cause data corruption.
> > >
> > >     N-joy,
> > >
> > >     --Tareq
> > >
> > >
> > >
> > >
> > >     On 01/27/2014 02:05 PM, Dafna Ron wrote:
> > >
> > >         I am adding Tareq for the Power Management implementation.
> > >
> > >         Dafna
> > >
> > >
> > >         On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
> > >
> > >             On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
> > >
> > >                 Powering off the host will never trigger vm migration.
> > >                 As far as engine is concerned it just lost connection
> > >                 to the host, but
> > >                 has no way of telling if the host is down or if a
> > >                 router is down.
> > >
> > >             Can´t it at least check with power management if the Host
> > >             status is down
> > >             first?
> > >
> > >             I mean, if the network is down there will be no response
> > >             from either PM
> > >             or Host. But if PM is up and can tell you that the Host is
> > >             down, sounds
> > >             rather clear cut to me...
> > >
> > >             Seems to me the VM's would be restarted sooner if the flow
> > >             was altered
> > >             to first check with PM if it´s a network or Host issue,
> > >             and if Host
> > >             issue, immediately restart VM's on another Host, instead
> > >             of waiting for
> > >             a potentially problematic Host to boot up eventually.
> > >
> > >             /K
> > >
> > >                 since vm's can continue running on the host even if
> > >                 engine has no access
> > >                 to it, starting the vm's on the second host can cause
> > >                 split brain and
> > >                 data corruption.
> > >
> > >                 The way that the engine knows what's going on is by
> > >                 sending heath check
> > >                 queries to the vdsm.
> > >                 Power management will try to reboot a host when the
> > >                 health checks to
> > >                 vdsm will not be answered.
> > >                 So... if engine gets no reply and has no way of
> > >                 rebooting the host, the
> > >                 host status will be changed to Non-Responsive and the
> > >                 vm's will be
> > >                 unknown because engine has no way of knowing what's
> > >                 happening with the
> > >                 vm's.
> > >                 Since reboot of the host will kill the vm's running on
> > >                 it - this will
> > >                 never cause any vm migration but... along with the
> > >                 High-Availability vm
> > >                 feature, you will be able to have some of the vm's
> > >                 re-started on the
> > >                 second host after the host reboot (and that is only if
> > >                 Power Management
> > >                 was confirmed as successful).
> > >
> > >                 VM migration is only triggered when:
> > >                 1. Cluster configuration states that the vm should be
> > >                 migrated in case
> > >                 of failure
> > >                 2. Engine has access to the host - so the failure is
> > >                 on the storage side
> > >                 and not the host side.
> > >                 3. the vms are not actively writing (although there
> > >                 might be a new RFE
> > >                 for it).
> > >
> > >                 hope this clears things up
> > >
> > >                 Dafna
> > >
> > >
> > >
> > >                 On 01/27/2014 10:11 AM, Andrew Lau wrote:
> > >
> > >                     Hi,
> > >
> > >                     Have you got power management enabled?
> > >
> > >                     That's the fencing feature required for the engine
> > >                     to ensure that the
> > >                     host is actually offline. It won't resume any
> > >                     other VMs to prevent
> > >                     potential VM corruption (eg. VM running on
> > >                     multiple hosts).
> > >
> > >                     Andrew.
> > >
> > >                     On Jan 27, 2014 5:12 PM, "Jaison peter"
> > >                     <urotrip2 at gmail.com <mailto:urotrip2 at gmail.com>
> > >                     <mailto:urotrip2 at gmail.com
> > >                     <mailto:urotrip2 at gmail.com>>> wrote:
> > >
> > >                          Hi all ,
> > >
> > >                          I was setting a two node ovirt cluster with
> > >                     ovirt engine on
> > >                          seperate node . I completed the configuration
> > >                     and tested VM  live
> > >                          migrations with out any issues . Then for
> > >                     checking cluster HA I
> > >                          powered down one host and expected vms
> > >                     running on that host to be
> > >                          migrated to the other one . But nothing
> > >                     happened , Engine detected
> > >                          host as un-rechable and marked it as
> > >                     non-operational and vm ran on
> > >                          that host went to 'unknown state' . Is that
> > >                     not possible to setup
> > >                          a fully HA ovirt cluster with two nodes ? or
> > >                     else is that my
> > >                          configuration problem ? please advice .
> > >
> > >                          Thanks & Regards
> > >
> > >                          Alex
> > >
> > >
>  _______________________________________________
> > >                          Users mailing list
> > >                     Users at ovirt.org <mailto:Users at ovirt.org>
> > >                     <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
> > >                     http://lists.ovirt.org/mailman/listinfo/users
> > >
> > >
> > >
> > >                     _______________________________________________
> > >                     Users mailing list
> > >                     Users at ovirt.org <mailto:Users at ovirt.org>
> > >                     http://lists.ovirt.org/mailman/listinfo/users
> > >
> > >
> > >                 --
> > >                 Dafna Ron
> > >                 _______________________________________________
> > >                 Users mailing list
> > >                 Users at ovirt.org <mailto:Users at ovirt.org>
> > >                 http://lists.ovirt.org/mailman/listinfo/users
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140128/01c30454/attachment-0001.html>