[Users] two node ovirt cluster with HA

Dafna Ron dron at redhat.com
Mon Jan 27 13:02:19 UTC 2014


Andrew,
Once this discussion is finished, and If what you like done is not in 
the current implementation can you please open a bug/feature request for 
it?

Thanks,

Dafna

On 01/27/2014 12:59 PM, Tareq Alayan wrote:
> Adding Eli.
>
>
> On 01/27/2014 02:50 PM, Andrew Lau wrote:
>> Hi,
>>
>> I think he was asking what if the power management device reported 
>> that the host was powered off. Then VMs should be brought back up as 
>> being off would essentially be the same as running a power cycle/reboot?
>>
>> Another example I'm seeing is what happens if the whole host loses 
>> power and it's power management device then becomes unavailable (ie. 
>> not reachable) then you're stuck in the case where it requires manual 
>> intervention.
>>
>> I would be interested to potentially see something like a timeout on 
>> those problematic VMs (eg. if nothing was read or write after x 
>> amount of time) then you could consider the host as offline? I guess 
>> then that adds a lot of risk..
>>
>>
>> On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <talayan at redhat.com 
>> <mailto:talayan at redhat.com>> wrote:
>>
>>     Hi,
>>
>>     Power management makes use of special *dedicated* hardware in
>>     order to restart hosts independently of host OS. The engine
>>     connects to a power management devices using a *dedicated*
>>     network IP address.
>>     The engine is capable of rebooting hosts that have entered a
>>     non-operational or non-responsive state,
>>     The abilities provided by all power management devices are: check
>>     status, start, stop and recycle (restart)...
>>
>>     In the case of non-responsive host: all of the VMs that are
>>     currently running on that host can also become non-responsive.
>>     However, the non-responsive host keeps locking the VM hard disk
>>     for all VMs it is running. Attempting to start a VM on a
>>     different host and assign the second host write privileges for
>>     the virtual machine hard disk image can cause data corruption.
>>     Rebooting allows the engine to assume that the lock on a VM hard
>>     disk image has been released.
>>     The engine can know for sure that the problematic host has been
>>     rebooted via the power management device and then it can start a
>>     VM from the problematic host on another host without risking data
>>     corruption.
>>     Important note: A virtual machine that has been marked
>>     highly-available can not be safely started on a different host
>>     without the certainty that doing so will not cause data corruption.
>>
>>     N-joy,
>>
>>     --Tareq
>>
>>
>>
>>
>>     On 01/27/2014 02:05 PM, Dafna Ron wrote:
>>
>>         I am adding Tareq for the Power Management implementation.
>>
>>         Dafna
>>
>>
>>         On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
>>
>>             On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
>>
>>                 Powering off the host will never trigger vm migration.
>>                 As far as engine is concerned it just lost connection
>>                 to the host, but
>>                 has no way of telling if the host is down or if a
>>                 router is down.
>>
>>             Can´t it at least check with power management if the Host
>>             status is down
>>             first?
>>
>>             I mean, if the network is down there will be no response
>>             from either PM
>>             or Host. But if PM is up and can tell you that the Host
>>             is down, sounds
>>             rather clear cut to me...
>>
>>             Seems to me the VM's would be restarted sooner if the
>>             flow was altered
>>             to first check with PM if it´s a network or Host issue,
>>             and if Host
>>             issue, immediately restart VM's on another Host, instead
>>             of waiting for
>>             a potentially problematic Host to boot up eventually.
>>
>>             /K
>>
>>                 since vm's can continue running on the host even if
>>                 engine has no access
>>                 to it, starting the vm's on the second host can cause
>>                 split brain and
>>                 data corruption.
>>
>>                 The way that the engine knows what's going on is by
>>                 sending heath check
>>                 queries to the vdsm.
>>                 Power management will try to reboot a host when the
>>                 health checks to
>>                 vdsm will not be answered.
>>                 So... if engine gets no reply and has no way of
>>                 rebooting the host, the
>>                 host status will be changed to Non-Responsive and the
>>                 vm's will be
>>                 unknown because engine has no way of knowing what's
>>                 happening with the
>>                 vm's.
>>                 Since reboot of the host will kill the vm's running
>>                 on it - this will
>>                 never cause any vm migration but... along with the
>>                 High-Availability vm
>>                 feature, you will be able to have some of the vm's
>>                 re-started on the
>>                 second host after the host reboot (and that is only
>>                 if Power Management
>>                 was confirmed as successful).
>>
>>                 VM migration is only triggered when:
>>                 1. Cluster configuration states that the vm should be
>>                 migrated in case
>>                 of failure
>>                 2. Engine has access to the host - so the failure is
>>                 on the storage side
>>                 and not the host side.
>>                 3. the vms are not actively writing (although there
>>                 might be a new RFE
>>                 for it).
>>
>>                 hope this clears things up
>>
>>                 Dafna
>>
>>
>>
>>                 On 01/27/2014 10:11 AM, Andrew Lau wrote:
>>
>>                     Hi,
>>
>>                     Have you got power management enabled?
>>
>>                     That's the fencing feature required for the
>>                     engine to ensure that the
>>                     host is actually offline. It won't resume any
>>                     other VMs to prevent
>>                     potential VM corruption (eg. VM running on
>>                     multiple hosts).
>>
>>                     Andrew.
>>
>>                     On Jan 27, 2014 5:12 PM, "Jaison peter"
>>                     <urotrip2 at gmail.com <mailto:urotrip2 at gmail.com>
>>                     <mailto:urotrip2 at gmail.com
>>                     <mailto:urotrip2 at gmail.com>>> wrote:
>>
>>                          Hi all ,
>>
>>                          I was setting a two node ovirt cluster with
>>                     ovirt engine on
>>                          seperate node . I completed the
>>                     configuration and tested VM  live
>>                          migrations with out any issues . Then for
>>                     checking cluster HA I
>>                          powered down one host and expected vms
>>                     running on that host to be
>>                          migrated to the other one . But nothing
>>                     happened , Engine detected
>>                          host as un-rechable and marked it as
>>                     non-operational and vm ran on
>>                          that host went to 'unknown state' . Is that
>>                     not possible to setup
>>                          a fully HA ovirt cluster with two nodes ? or
>>                     else is that my
>>                          configuration problem ? please advice .
>>
>>                          Thanks & Regards
>>
>>                          Alex
>>
>>                      _______________________________________________
>>                          Users mailing list
>>                     Users at ovirt.org <mailto:Users at ovirt.org>
>>                     <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
>>                     http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>                     _______________________________________________
>>                     Users mailing list
>>                     Users at ovirt.org <mailto:Users at ovirt.org>
>>                     http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>                 -- 
>>                 Dafna Ron
>>                 _______________________________________________
>>                 Users mailing list
>>                 Users at ovirt.org <mailto:Users at ovirt.org>
>>                 http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>
>>
>>
>>
>


-- 
Dafna Ron



More information about the Users mailing list