Re: [Users] two node ovirt cluster with HA

27 Jan 2014

      On Mon, 2014-01-27 at 14:43 +0200, Tareq Alayan wrote:
...
Hi,
Power management makes use of special *dedicated* hardware in order to 
restart hosts independently of host OS. The engine connects to a power 
management devices using a *dedicated* network IP address.
The engine is capable of rebooting hosts that have entered a 
non-operational or non-responsive state,
The abilities provided by all power management devices are: check 
status, start, stop and recycle (restart)...
In the case of non-responsive host: all of the VMs that are currently 
running on that host can also become non-responsive. However, the 
non-responsive host keeps locking the VM hard disk for all VMs it is 
running. Attempting to start a VM on a different host and assign the 
second host write privileges for the virtual machine hard disk image can 
cause data corruption.
Exactly! If Engine was to first check towards the power management that
the problematic/non-responsove Host indeed is down, there would be no
risk of data corruption.

That´s why I suggested a change in the HA flow to first check if the
Host is indeed down, if so, just start the VM's on another Host.

/K
...
Rebooting allows the engine to assume that the lock on a VM hard disk 
image has been released.
The engine can know for sure that the problematic host has been rebooted 
via the power management device and then it can start a VM from the 
problematic host on another host without risking data corruption.
Important note: A virtual machine that has been marked highly-available 
can not be safely started on a different host without the certainty that 
doing so will not cause data corruption.
N-joy,
--Tareq
On 01/27/2014 02:05 PM, Dafna Ron wrote:
...
I am adding Tareq for the Power Management implementation.
Dafna
On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
...
On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
...
Powering off the host will never trigger vm migration.
As far as engine is concerned it just lost connection to the host, but
has no way of telling if the host is down or if a router is down.
Can´t it at least check with power management if the Host status is down
first?
I mean, if the network is down there will be no response from either PM
or Host. But if PM is up and can tell you that the Host is down, sounds
rather clear cut to me...
Seems to me the VM's would be restarted sooner if the flow was altered
to first check with PM if it´s a network or Host issue, and if Host
issue, immediately restart VM's on another Host, instead of waiting for
a potentially problematic Host to boot up eventually.
/K
...
since vm's can continue running on the host even if engine has no 
access
to it, starting the vm's on the second host can cause split brain and
data corruption.
The way that the engine knows what's going on is by sending heath check
queries to the vdsm.
Power management will try to reboot a host when the health checks to
vdsm will not be answered.
So... if engine gets no reply and has no way of rebooting the host, the
host status will be changed to Non-Responsive and the vm's will be
unknown because engine has no way of knowing what's happening with the
vm's.
Since reboot of the host will kill the vm's running on it - this will
never cause any vm migration but... along with the High-Availability vm
feature, you will be able to have some of the vm's re-started on the
second host after the host reboot (and that is only if Power Management
was confirmed as successful).
VM migration is only triggered when:
1. Cluster configuration states that the vm should be migrated in case
of failure
2. Engine has access to the host - so the failure is on the storage 
side
and not the host side.
3. the vms are not actively writing (although there might be a new RFE
for it).
hope this clears things up
Dafna
On 01/27/2014 10:11 AM, Andrew Lau wrote:
...
Hi,
Have you got power management enabled?
That's the fencing feature required for the engine to ensure that the
host is actually offline. It won't resume any other VMs to prevent
potential VM corruption (eg. VM running on multiple hosts).
Andrew.
On Jan 27, 2014 5:12 PM, "Jaison peter" <urotrip2@gmail.com
<mailto:urotrip2@gmail.com>> wrote:
Hi all ,
I was setting a two node ovirt cluster with ovirt engine on
     seperate node . I completed the configuration and tested VM  live
     migrations with out any issues . Then for checking cluster HA I
     powered down one host and expected vms running on that host to be
     migrated to the other one . But nothing happened , Engine 
detected
     host as un-rechable and marked it as non-operational and vm 
ran on
     that host went to 'unknown state' . Is that not possible to setup
     a fully HA ovirt cluster with two nodes ? or else is that my
     configuration problem ? please advice .
Thanks & Regards
Alex
_______________________________________________
     Users mailing list
     Users@ovirt.org <mailto:Users@ovirt.org>
     http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
-- 
Dafna Ron
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
-- 

Med Vänliga Hälsningar

-------------------------------------------------------------------------------
Karli Sjöberg
Swedish University of Agricultural Sciences Box 7079 (Visiting Address
Kronåsvägen 8)
S-750 07 Uppsala, Sweden
Phone:  +46-(0)18-67 15 66
karli.sjoberg@slu.se