[Users] two node ovirt cluster with HA
Tareq Alayan
talayan at redhat.com
Mon Jan 27 12:59:02 UTC 2014
Adding Eli.
On 01/27/2014 02:50 PM, Andrew Lau wrote:
> Hi,
>
> I think he was asking what if the power management device reported
> that the host was powered off. Then VMs should be brought back up as
> being off would essentially be the same as running a power cycle/reboot?
>
> Another example I'm seeing is what happens if the whole host loses
> power and it's power management device then becomes unavailable (ie.
> not reachable) then you're stuck in the case where it requires manual
> intervention.
>
> I would be interested to potentially see something like a timeout on
> those problematic VMs (eg. if nothing was read or write after x amount
> of time) then you could consider the host as offline? I guess then
> that adds a lot of risk..
>
>
> On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <talayan at redhat.com
> <mailto:talayan at redhat.com>> wrote:
>
> Hi,
>
> Power management makes use of special *dedicated* hardware in
> order to restart hosts independently of host OS. The engine
> connects to a power management devices using a *dedicated* network
> IP address.
> The engine is capable of rebooting hosts that have entered a
> non-operational or non-responsive state,
> The abilities provided by all power management devices are: check
> status, start, stop and recycle (restart)...
>
> In the case of non-responsive host: all of the VMs that are
> currently running on that host can also become non-responsive.
> However, the non-responsive host keeps locking the VM hard disk
> for all VMs it is running. Attempting to start a VM on a different
> host and assign the second host write privileges for the virtual
> machine hard disk image can cause data corruption.
> Rebooting allows the engine to assume that the lock on a VM hard
> disk image has been released.
> The engine can know for sure that the problematic host has been
> rebooted via the power management device and then it can start a
> VM from the problematic host on another host without risking data
> corruption.
> Important note: A virtual machine that has been marked
> highly-available can not be safely started on a different host
> without the certainty that doing so will not cause data corruption.
>
> N-joy,
>
> --Tareq
>
>
>
>
> On 01/27/2014 02:05 PM, Dafna Ron wrote:
>
> I am adding Tareq for the Power Management implementation.
>
> Dafna
>
>
> On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
>
> On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
>
> Powering off the host will never trigger vm migration.
> As far as engine is concerned it just lost connection
> to the host, but
> has no way of telling if the host is down or if a
> router is down.
>
> Can´t it at least check with power management if the Host
> status is down
> first?
>
> I mean, if the network is down there will be no response
> from either PM
> or Host. But if PM is up and can tell you that the Host is
> down, sounds
> rather clear cut to me...
>
> Seems to me the VM's would be restarted sooner if the flow
> was altered
> to first check with PM if it´s a network or Host issue,
> and if Host
> issue, immediately restart VM's on another Host, instead
> of waiting for
> a potentially problematic Host to boot up eventually.
>
> /K
>
> since vm's can continue running on the host even if
> engine has no access
> to it, starting the vm's on the second host can cause
> split brain and
> data corruption.
>
> The way that the engine knows what's going on is by
> sending heath check
> queries to the vdsm.
> Power management will try to reboot a host when the
> health checks to
> vdsm will not be answered.
> So... if engine gets no reply and has no way of
> rebooting the host, the
> host status will be changed to Non-Responsive and the
> vm's will be
> unknown because engine has no way of knowing what's
> happening with the
> vm's.
> Since reboot of the host will kill the vm's running on
> it - this will
> never cause any vm migration but... along with the
> High-Availability vm
> feature, you will be able to have some of the vm's
> re-started on the
> second host after the host reboot (and that is only if
> Power Management
> was confirmed as successful).
>
> VM migration is only triggered when:
> 1. Cluster configuration states that the vm should be
> migrated in case
> of failure
> 2. Engine has access to the host - so the failure is
> on the storage side
> and not the host side.
> 3. the vms are not actively writing (although there
> might be a new RFE
> for it).
>
> hope this clears things up
>
> Dafna
>
>
>
> On 01/27/2014 10:11 AM, Andrew Lau wrote:
>
> Hi,
>
> Have you got power management enabled?
>
> That's the fencing feature required for the engine
> to ensure that the
> host is actually offline. It won't resume any
> other VMs to prevent
> potential VM corruption (eg. VM running on
> multiple hosts).
>
> Andrew.
>
> On Jan 27, 2014 5:12 PM, "Jaison peter"
> <urotrip2 at gmail.com <mailto:urotrip2 at gmail.com>
> <mailto:urotrip2 at gmail.com
> <mailto:urotrip2 at gmail.com>>> wrote:
>
> Hi all ,
>
> I was setting a two node ovirt cluster with
> ovirt engine on
> seperate node . I completed the configuration
> and tested VM live
> migrations with out any issues . Then for
> checking cluster HA I
> powered down one host and expected vms
> running on that host to be
> migrated to the other one . But nothing
> happened , Engine detected
> host as un-rechable and marked it as
> non-operational and vm ran on
> that host went to 'unknown state' . Is that
> not possible to setup
> a fully HA ovirt cluster with two nodes ? or
> else is that my
> configuration problem ? please advice .
>
> Thanks & Regards
>
> Alex
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org <mailto:Users at ovirt.org>
> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org <mailto:Users at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/users
>
>
> --
> Dafna Ron
> _______________________________________________
> Users mailing list
> Users at ovirt.org <mailto:Users at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140127/aee388d3/attachment.html>
More information about the Users
mailing list