[Users] two node ovirt cluster with HA
Dafna Ron
dron at redhat.com
Mon Jan 27 13:02:19 UTC 2014
Andrew,
Once this discussion is finished, and If what you like done is not in
the current implementation can you please open a bug/feature request for
it?
Thanks,
Dafna
On 01/27/2014 12:59 PM, Tareq Alayan wrote:
> Adding Eli.
>
>
> On 01/27/2014 02:50 PM, Andrew Lau wrote:
>> Hi,
>>
>> I think he was asking what if the power management device reported
>> that the host was powered off. Then VMs should be brought back up as
>> being off would essentially be the same as running a power cycle/reboot?
>>
>> Another example I'm seeing is what happens if the whole host loses
>> power and it's power management device then becomes unavailable (ie.
>> not reachable) then you're stuck in the case where it requires manual
>> intervention.
>>
>> I would be interested to potentially see something like a timeout on
>> those problematic VMs (eg. if nothing was read or write after x
>> amount of time) then you could consider the host as offline? I guess
>> then that adds a lot of risk..
>>
>>
>> On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <talayan at redhat.com
>> <mailto:talayan at redhat.com>> wrote:
>>
>> Hi,
>>
>> Power management makes use of special *dedicated* hardware in
>> order to restart hosts independently of host OS. The engine
>> connects to a power management devices using a *dedicated*
>> network IP address.
>> The engine is capable of rebooting hosts that have entered a
>> non-operational or non-responsive state,
>> The abilities provided by all power management devices are: check
>> status, start, stop and recycle (restart)...
>>
>> In the case of non-responsive host: all of the VMs that are
>> currently running on that host can also become non-responsive.
>> However, the non-responsive host keeps locking the VM hard disk
>> for all VMs it is running. Attempting to start a VM on a
>> different host and assign the second host write privileges for
>> the virtual machine hard disk image can cause data corruption.
>> Rebooting allows the engine to assume that the lock on a VM hard
>> disk image has been released.
>> The engine can know for sure that the problematic host has been
>> rebooted via the power management device and then it can start a
>> VM from the problematic host on another host without risking data
>> corruption.
>> Important note: A virtual machine that has been marked
>> highly-available can not be safely started on a different host
>> without the certainty that doing so will not cause data corruption.
>>
>> N-joy,
>>
>> --Tareq
>>
>>
>>
>>
>> On 01/27/2014 02:05 PM, Dafna Ron wrote:
>>
>> I am adding Tareq for the Power Management implementation.
>>
>> Dafna
>>
>>
>> On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
>>
>> On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
>>
>> Powering off the host will never trigger vm migration.
>> As far as engine is concerned it just lost connection
>> to the host, but
>> has no way of telling if the host is down or if a
>> router is down.
>>
>> Can´t it at least check with power management if the Host
>> status is down
>> first?
>>
>> I mean, if the network is down there will be no response
>> from either PM
>> or Host. But if PM is up and can tell you that the Host
>> is down, sounds
>> rather clear cut to me...
>>
>> Seems to me the VM's would be restarted sooner if the
>> flow was altered
>> to first check with PM if it´s a network or Host issue,
>> and if Host
>> issue, immediately restart VM's on another Host, instead
>> of waiting for
>> a potentially problematic Host to boot up eventually.
>>
>> /K
>>
>> since vm's can continue running on the host even if
>> engine has no access
>> to it, starting the vm's on the second host can cause
>> split brain and
>> data corruption.
>>
>> The way that the engine knows what's going on is by
>> sending heath check
>> queries to the vdsm.
>> Power management will try to reboot a host when the
>> health checks to
>> vdsm will not be answered.
>> So... if engine gets no reply and has no way of
>> rebooting the host, the
>> host status will be changed to Non-Responsive and the
>> vm's will be
>> unknown because engine has no way of knowing what's
>> happening with the
>> vm's.
>> Since reboot of the host will kill the vm's running
>> on it - this will
>> never cause any vm migration but... along with the
>> High-Availability vm
>> feature, you will be able to have some of the vm's
>> re-started on the
>> second host after the host reboot (and that is only
>> if Power Management
>> was confirmed as successful).
>>
>> VM migration is only triggered when:
>> 1. Cluster configuration states that the vm should be
>> migrated in case
>> of failure
>> 2. Engine has access to the host - so the failure is
>> on the storage side
>> and not the host side.
>> 3. the vms are not actively writing (although there
>> might be a new RFE
>> for it).
>>
>> hope this clears things up
>>
>> Dafna
>>
>>
>>
>> On 01/27/2014 10:11 AM, Andrew Lau wrote:
>>
>> Hi,
>>
>> Have you got power management enabled?
>>
>> That's the fencing feature required for the
>> engine to ensure that the
>> host is actually offline. It won't resume any
>> other VMs to prevent
>> potential VM corruption (eg. VM running on
>> multiple hosts).
>>
>> Andrew.
>>
>> On Jan 27, 2014 5:12 PM, "Jaison peter"
>> <urotrip2 at gmail.com <mailto:urotrip2 at gmail.com>
>> <mailto:urotrip2 at gmail.com
>> <mailto:urotrip2 at gmail.com>>> wrote:
>>
>> Hi all ,
>>
>> I was setting a two node ovirt cluster with
>> ovirt engine on
>> seperate node . I completed the
>> configuration and tested VM live
>> migrations with out any issues . Then for
>> checking cluster HA I
>> powered down one host and expected vms
>> running on that host to be
>> migrated to the other one . But nothing
>> happened , Engine detected
>> host as un-rechable and marked it as
>> non-operational and vm ran on
>> that host went to 'unknown state' . Is that
>> not possible to setup
>> a fully HA ovirt cluster with two nodes ? or
>> else is that my
>> configuration problem ? please advice .
>>
>> Thanks & Regards
>>
>> Alex
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org <mailto:Users at ovirt.org>
>> <mailto:Users at ovirt.org <mailto:Users at ovirt.org>>
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org <mailto:Users at ovirt.org>
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>> --
>> Dafna Ron
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org <mailto:Users at ovirt.org>
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>
>>
>>
>>
>
--
Dafna Ron
More information about the Users
mailing list