[Users] two node ovirt cluster with HA

Tareq Alayan talayan at redhat.com
Mon Jan 27 12:43:29 UTC 2014


Hi,

Power management makes use of special *dedicated* hardware in order to 
restart hosts independently of host OS. The engine connects to a power 
management devices using a *dedicated* network IP address.
The engine is capable of rebooting hosts that have entered a 
non-operational or non-responsive state,
The abilities provided by all power management devices are: check 
status, start, stop and recycle (restart)...

In the case of non-responsive host: all of the VMs that are currently 
running on that host can also become non-responsive. However, the 
non-responsive host keeps locking the VM hard disk for all VMs it is 
running. Attempting to start a VM on a different host and assign the 
second host write privileges for the virtual machine hard disk image can 
cause data corruption.
Rebooting allows the engine to assume that the lock on a VM hard disk 
image has been released.
The engine can know for sure that the problematic host has been rebooted 
via the power management device and then it can start a VM from the 
problematic host on another host without risking data corruption.
Important note: A virtual machine that has been marked highly-available 
can not be safely started on a different host without the certainty that 
doing so will not cause data corruption.

N-joy,

--Tareq



On 01/27/2014 02:05 PM, Dafna Ron wrote:
> I am adding Tareq for the Power Management implementation.
>
> Dafna
>
>
> On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
>> On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
>>> Powering off the host will never trigger vm migration.
>>> As far as engine is concerned it just lost connection to the host, but
>>> has no way of telling if the host is down or if a router is down.
>> Can´t it at least check with power management if the Host status is down
>> first?
>>
>> I mean, if the network is down there will be no response from either PM
>> or Host. But if PM is up and can tell you that the Host is down, sounds
>> rather clear cut to me...
>>
>> Seems to me the VM's would be restarted sooner if the flow was altered
>> to first check with PM if it´s a network or Host issue, and if Host
>> issue, immediately restart VM's on another Host, instead of waiting for
>> a potentially problematic Host to boot up eventually.
>>
>> /K
>>
>>> since vm's can continue running on the host even if engine has no 
>>> access
>>> to it, starting the vm's on the second host can cause split brain and
>>> data corruption.
>>>
>>> The way that the engine knows what's going on is by sending heath check
>>> queries to the vdsm.
>>> Power management will try to reboot a host when the health checks to
>>> vdsm will not be answered.
>>> So... if engine gets no reply and has no way of rebooting the host, the
>>> host status will be changed to Non-Responsive and the vm's will be
>>> unknown because engine has no way of knowing what's happening with the
>>> vm's.
>>> Since reboot of the host will kill the vm's running on it - this will
>>> never cause any vm migration but... along with the High-Availability vm
>>> feature, you will be able to have some of the vm's re-started on the
>>> second host after the host reboot (and that is only if Power Management
>>> was confirmed as successful).
>>>
>>> VM migration is only triggered when:
>>> 1. Cluster configuration states that the vm should be migrated in case
>>> of failure
>>> 2. Engine has access to the host - so the failure is on the storage 
>>> side
>>> and not the host side.
>>> 3. the vms are not actively writing (although there might be a new RFE
>>> for it).
>>>
>>> hope this clears things up
>>>
>>> Dafna
>>>
>>>
>>>
>>> On 01/27/2014 10:11 AM, Andrew Lau wrote:
>>>> Hi,
>>>>
>>>> Have you got power management enabled?
>>>>
>>>> That's the fencing feature required for the engine to ensure that the
>>>> host is actually offline. It won't resume any other VMs to prevent
>>>> potential VM corruption (eg. VM running on multiple hosts).
>>>>
>>>> Andrew.
>>>>
>>>> On Jan 27, 2014 5:12 PM, "Jaison peter" <urotrip2 at gmail.com
>>>> <mailto:urotrip2 at gmail.com>> wrote:
>>>>
>>>>      Hi all ,
>>>>
>>>>      I was setting a two node ovirt cluster with ovirt engine on
>>>>      seperate node . I completed the configuration and tested VM  live
>>>>      migrations with out any issues . Then for checking cluster HA I
>>>>      powered down one host and expected vms running on that host to be
>>>>      migrated to the other one . But nothing happened , Engine 
>>>> detected
>>>>      host as un-rechable and marked it as non-operational and vm 
>>>> ran on
>>>>      that host went to 'unknown state' . Is that not possible to setup
>>>>      a fully HA ovirt cluster with two nodes ? or else is that my
>>>>      configuration problem ? please advice .
>>>>
>>>>      Thanks & Regards
>>>>
>>>>      Alex
>>>>
>>>>      _______________________________________________
>>>>      Users mailing list
>>>>      Users at ovirt.org <mailto:Users at ovirt.org>
>>>>      http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>> -- 
>>> Dafna Ron
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
>




More information about the Users mailing list