Hi,
Power management makes use of special *dedicated* hardware in order to
restart hosts independently of host OS. The engine connects to a power
management devices using a *dedicated* network IP address.
The engine is capable of rebooting hosts that have entered a
non-operational or non-responsive state,
The abilities provided by all power management devices are: check
status, start, stop and recycle (restart)...
In the case of non-responsive host: all of the VMs that are currently
running on that host can also become non-responsive. However, the
non-responsive host keeps locking the VM hard disk for all VMs it is
running. Attempting to start a VM on a different host and assign the
second host write privileges for the virtual machine hard disk image can
cause data corruption.
Rebooting allows the engine to assume that the lock on a VM hard disk
image has been released.
The engine can know for sure that the problematic host has been rebooted
via the power management device and then it can start a VM from the
problematic host on another host without risking data corruption.
Important note: A virtual machine that has been marked highly-available
can not be safely started on a different host without the certainty that
doing so will not cause data corruption.
N-joy,
--Tareq
On 01/27/2014 02:05 PM, Dafna Ron wrote:
I am adding Tareq for the Power Management implementation.
Dafna
On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
> On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
>> Powering off the host will never trigger vm migration.
>> As far as engine is concerned it just lost connection to the host, but
>> has no way of telling if the host is down or if a router is down.
> Can´t it at least check with power management if the Host status is down
> first?
>
> I mean, if the network is down there will be no response from either PM
> or Host. But if PM is up and can tell you that the Host is down, sounds
> rather clear cut to me...
>
> Seems to me the VM's would be restarted sooner if the flow was altered
> to first check with PM if it´s a network or Host issue, and if Host
> issue, immediately restart VM's on another Host, instead of waiting for
> a potentially problematic Host to boot up eventually.
>
> /K
>
>> since vm's can continue running on the host even if engine has no
>> access
>> to it, starting the vm's on the second host can cause split brain and
>> data corruption.
>>
>> The way that the engine knows what's going on is by sending heath check
>> queries to the vdsm.
>> Power management will try to reboot a host when the health checks to
>> vdsm will not be answered.
>> So... if engine gets no reply and has no way of rebooting the host, the
>> host status will be changed to Non-Responsive and the vm's will be
>> unknown because engine has no way of knowing what's happening with the
>> vm's.
>> Since reboot of the host will kill the vm's running on it - this will
>> never cause any vm migration but... along with the High-Availability vm
>> feature, you will be able to have some of the vm's re-started on the
>> second host after the host reboot (and that is only if Power Management
>> was confirmed as successful).
>>
>> VM migration is only triggered when:
>> 1. Cluster configuration states that the vm should be migrated in case
>> of failure
>> 2. Engine has access to the host - so the failure is on the storage
>> side
>> and not the host side.
>> 3. the vms are not actively writing (although there might be a new RFE
>> for it).
>>
>> hope this clears things up
>>
>> Dafna
>>
>>
>>
>> On 01/27/2014 10:11 AM, Andrew Lau wrote:
>>> Hi,
>>>
>>> Have you got power management enabled?
>>>
>>> That's the fencing feature required for the engine to ensure that the
>>> host is actually offline. It won't resume any other VMs to prevent
>>> potential VM corruption (eg. VM running on multiple hosts).
>>>
>>> Andrew.
>>>
>>> On Jan 27, 2014 5:12 PM, "Jaison peter" <urotrip2(a)gmail.com
>>> <mailto:urotrip2@gmail.com>> wrote:
>>>
>>> Hi all ,
>>>
>>> I was setting a two node ovirt cluster with ovirt engine on
>>> seperate node . I completed the configuration and tested VM live
>>> migrations with out any issues . Then for checking cluster HA I
>>> powered down one host and expected vms running on that host to be
>>> migrated to the other one . But nothing happened , Engine
>>> detected
>>> host as un-rechable and marked it as non-operational and vm
>>> ran on
>>> that host went to 'unknown state' . Is that not possible to
setup
>>> a fully HA ovirt cluster with two nodes ? or else is that my
>>> configuration problem ? please advice .
>>>
>>> Thanks & Regards
>>>
>>> Alex
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org <mailto:Users@ovirt.org>
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>
>> --
>> Dafna Ron
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
>
>