[Users] two node ovirt cluster with HA

Tareq Alayan talayan at redhat.com
Mon Jan 27 12:59:02 UTC 2014


Adding Eli.


On 01/27/2014 02:50 PM, Andrew Lau wrote:
> Hi,
>
> I think he was asking what if the power management device reported 
> that the host was powered off. Then VMs should be brought back up as 
> being off would essentially be the same as running a power cycle/reboot?
>
> Another example I'm seeing is what happens if the whole host loses 
> power and it's power management device then becomes unavailable (ie. 
> not reachable) then you're stuck in the case where it requires manual 
> intervention.
>
> I would be interested to potentially see something like a timeout on 
> those problematic VMs (eg. if nothing was read or write after x amount 
> of time) then you could consider the host as offline? I guess then 
> that adds a lot of risk..
>
>
> On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <talayan at redhat.com 
> <mailto:talayan at redhat.com>> wrote:
>
>     Hi,
>
>     Power management makes use of special *dedicated* hardware in
>     order to restart hosts independently of host OS. The engine
>     connects to a power management devices using a *dedicated* network
>     IP address.
>     The engine is capable of rebooting hosts that have entered a
>     non-operational or non-responsive state,
>     The abilities provided by all power management devices are: check
>     status, start, stop and recycle (restart)...
>
>     In the case of non-responsive host: all of the VMs that are
>     currently running on that host can also become non-responsive.
>     However, the non-responsive host keeps locking the VM hard disk
>     for all VMs it is running. Attempting to start a VM on a different
>     host and assign the second host write privileges for the virtual
>     machine hard disk image can cause data corruption.
>     Rebooting allows the engine to assume that the lock on a VM hard
>     disk image has been released.
>     The engine can know for sure that the problematic host has been
>     rebooted via the power management device and then it can start a
>     VM from the problematic host on another host without risking data
>     corruption.
>     Important note: A virtual machine that has been marked
>     highly-available can not be safely started on a different host
>     without the certainty that doing so will not cause data corruption.
>
>     N-joy,
>
>     --Tareq
>
>
>
>
>     On 01/27/2014 02:05 PM, Dafna Ron wrote:
>
>         I am adding Tareq for the Power Management implementation.
>
>         Dafna
>
>
>         On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
>
>             On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
>
>                 Powering off the host will never trigger vm migration.
>                 As far as engine is concerned it just lost connection
>                 to the host, but
>                 has no way of telling if the host is down or if a
>                 router is down.
>
>             Can´t it at least check with power management if the Host
>             status is down
>             first?
>
>             I mean, if the network is down there will be no response
>             from either PM
>             or Host. But if PM is up and can tell you that the Host is
>             down, sounds
>             rather clear cut to me...
>
>             Seems to me the VM's would be restarted sooner if the flow
>             was altered
>             to first check with PM if it´s a network or Host issue,
>             and if Host
>             issue, immediately restart VM's on another Host, instead
>             of waiting for
>             a potentially problematic Host to boot up eventually.
>
>             /K
>
>                 since vm's can continue running on the host even if
>                 engine has no access
>                 to it, starting the vm's on the second host can cause
>                 split brain and
>                 data corruption.
>
>                 The way that the engine knows what's going on is by
>                 sending heath check
>                 queries to the vdsm.
>                 Power management will try to reboot a host when the
>                 health checks to
>                 vdsm will not be answered.
>                 So... if engine gets no reply and has no way of
>                 rebooting the host, the
>                 host status will be changed to Non-Responsive and the
>                 vm's will be
>                 unknown because engine has no way of knowing what's
>                 happening with the
>                 vm's.
>                 Since reboot of the host will kill the vm's running on
>                 it - this will
>                 never cause any vm migration but... along with the
>                 High-Availability vm
>                 feature, you will be able to have some of the vm's
>                 re-started on the
>                 second host after the host reboot (and that is only if
>                 Power Management
>                 was confirmed as successful).
>
>                 VM migration is only triggered when:
>                 1. Cluster configuration states that the vm should be
>                 migrated in case
>                 of failure
>                 2. Engine has access to the host - so the failure is
>                 on the storage side
>                 and not the host side.
>                 3. the vms are not actively writing (although there
>                 might be a new RFE
>                 for it).
>
>                 hope this clears things up
>
>                 Dafna
>
>
>
>                 On 01/27/2014 10:11 AM, Andrew Lau wrote:
>
>                     Hi,
>
>                     Have you got power management enabled?
>
>                     That's the fencing feature required for the engine
>                     to ensure that the
>                     host is actually offline. It won't resume any
>                     other VMs to prevent
>                     potential VM corruption (eg. VM running on
>                     multiple hosts).
>
>                     Andrew.
>
>                     On Jan 27, 2014 5:12 PM, "Jaison peter"
>                     <urotrip2 at gmail.com <mailto:urotrip2 at gmail.com>
>                     <mailto:urotrip2 at gmail.com
>                     <mailto:urotrip2 at gmail.com>>> wrote:
>
>                          Hi all ,
>
>                          I was setting a two node ovirt cluster with
>                     ovirt engine on
>                          seperate node . I completed the configuration
>                     and tested VM  live
>                          migrations with out any issues . Then for
>                     checking cluster HA I
>                          powered down one host and expected vms
>                     running on that host to be
>                          migrated to the other one . But nothing
>                     happened , Engine detected
>                          host as un-rechable and marked it as
>                     non-operational and vm ran on
>                          that host went to 'unknown state' . Is that
>                     not possible to setup
>                          a fully HA ovirt cluster with two nodes ? or
>                     else is that my
>                          configuration problem ? please advice .
>
>                          Thanks & Regards
>
>                          Alex
>
>                          _______________________________________________
>                          Users mailing list
>                     Users at ovirt.org <mailto:Users at ovirt.org>
>                     <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
>                     http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>                     _______________________________________________
>                     Users mailing list
>                     Users at ovirt.org <mailto:Users at ovirt.org>
>                     http://lists.ovirt.org/mailman/listinfo/users
>
>
>                 -- 
>                 Dafna Ron
>                 _______________________________________________
>                 Users mailing list
>                 Users at ovirt.org <mailto:Users at ovirt.org>
>                 http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140127/aee388d3/attachment.html>


More information about the Users mailing list