Re: [Users] two node ovirt cluster with HA

28 Jan 2014


      Thanks !


On Tue, Jan 28, 2014 at 2:04 PM, Eli Mesika <emesika@redhat.com> wrote:
...
----- Original Message -----
...
From: "Jaison peter" <urotrip2@gmail.com>
To: "Eli Mesika" <emesika@redhat.com>
Cc: users@ovirt.org, "Tareq Alayan" <talayan@redhat.com>
Sent: Tuesday, January 28, 2014 7:33:35 AM
Subject: Re: [Users] two node ovirt cluster with HA
Thank you all for your valuable feedback .
Can you please specify some of the supported fencing devices in ovirt ?
For oVirt 3.4 :
apc,apc_snmp,bladecenter,cisco_ucs,drac5,drac7,eps,hpblade,ilo,ilo2,ilo3,ilo4,ipmilan,rsa,rsb,wti
...
On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika <emesika@redhat.com> wrote:
...
----- Original Message -----
...
From: "Tareq Alayan" <talayan@redhat.com>
To: "Andrew Lau" <andrew@andrewklau.com>, "Eli Mesika" <
emesika@redhat.com>
Cc: dron@redhat.com, "Karli Sjöberg" <Karli.Sjoberg@slu.se>,
users@ovirt.org
Sent: Monday, January 27, 2014 2:59:02 PM
Subject: Re: [Users] two node ovirt cluster with HA
Adding Eli.
I just want to summarize the requirement as I understand it:
In the case that a Host that is running HA VMs and have PM configured
...
...
turned off manually :
1) The non-responsive treatment should be modified to check Host status
via PM agent
2) If Host is off , HA VMs will attempt to run on another host ASAP
3) The host status should be set to DOWN
4) No attempt to restart vdsm (soft fencing) or restart the host (hard
fencing) will be done
Is the above correct? if so , a RFE on that can be opened
...
On 01/27/2014 02:50 PM, Andrew Lau wrote:
...
Hi,
I think he was asking what if the power management device reported
that the host was powered off. Then VMs should be brought back up
as
...
...
being off would essentially be the same as running a power
cycle/reboot?
Another example I'm seeing is what happens if the whole host loses
power and it's power management device then becomes unavailable
(ie.
not reachable) then you're stuck in the case where it requires
manual
intervention.
I would be interested to potentially see something like a timeout
on
those problematic VMs (eg. if nothing was read or write after x
amount
of time) then you could consider the host as offline? I guess then
that adds a lot of risk..
On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan <talayan@redhat.com
<mailto:talayan@redhat.com>> wrote:
Hi,
Power management makes use of special *dedicated* hardware in
    order to restart hosts independently of host OS. The engine
    connects to a power management devices using a *dedicated*
network
    IP address.
    The engine is capable of rebooting hosts that have entered a
    non-operational or non-responsive state,
    The abilities provided by all power management devices are:
check
    status, start, stop and recycle (restart)...
In the case of non-responsive host: all of the VMs that are
    currently running on that host can also become non-responsive.
    However, the non-responsive host keeps locking the VM hard disk
    for all VMs it is running. Attempting to start a VM on a
different
    host and assign the second host write privileges for the
virtual
    machine hard disk image can cause data corruption.
    Rebooting allows the engine to assume that the lock on a VM
hard
    disk image has been released.
    The engine can know for sure that the problematic host has been
    rebooted via the power management device and then it can start
a
    VM from the problematic host on another host without risking
data
    corruption.
    Important note: A virtual machine that has been marked
    highly-available can not be safely started on a different host
    without the certainty that doing so will not cause data
corruption.
N-joy,
--Tareq
On 01/27/2014 02:05 PM, Dafna Ron wrote:
I am adding Tareq for the Power Management implementation.
Dafna
On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
Powering off the host will never trigger vm
migration.
                As far as engine is concerned it just lost
connection
                to the host, but
                has no way of telling if the host is down or if a
                router is down.
Can´t it at least check with power management if the
Host
            status is down
            first?
I mean, if the network is down there will be no
response
            from either PM
            or Host. But if PM is up and can tell you that the
Host is
            down, sounds
            rather clear cut to me...
Seems to me the VM's would be restarted sooner if the
flow
            was altered
            to first check with PM if it´s a network or Host issue,
            and if Host
            issue, immediately restart VM's on another Host,
instead
            of waiting for
            a potentially problematic Host to boot up eventually.
/K
since vm's can continue running on the host even if
                engine has no access
                to it, starting the vm's on the second host can
cause
                split brain and
                data corruption.
The way that the engine knows what's going on is by
                sending heath check
                queries to the vdsm.
                Power management will try to reboot a host when the
                health checks to
                vdsm will not be answered.
                So... if engine gets no reply and has no way of
                rebooting the host, the
                host status will be changed to Non-Responsive and
...
...
...
...
vm's will be
                unknown because engine has no way of knowing what's
                happening with the
                vm's.
                Since reboot of the host will kill the vm's
running on
...
it - this will
                never cause any vm migration but... along with the
                High-Availability vm
                feature, you will be able to have some of the vm's
                re-started on the
                second host after the host reboot (and that is
only if
...
Power Management
                was confirmed as successful).
VM migration is only triggered when:
                1. Cluster configuration states that the vm should
be
...
migrated in case
                of failure
                2. Engine has access to the host - so the failure
is
...
on the storage side
                and not the host side.
                3. the vms are not actively writing (although there
                might be a new RFE
                for it).
hope this clears things up
Dafna
On 01/27/2014 10:11 AM, Andrew Lau wrote:
Hi,
Have you got power management enabled?
That's the fencing feature required for the
engine
...
to ensure that the
                    host is actually offline. It won't resume any
                    other VMs to prevent
                    potential VM corruption (eg. VM running on
                    multiple hosts).
Andrew.
On Jan 27, 2014 5:12 PM, "Jaison peter"
                    <urotrip2@gmail.com <mailto:urotrip2@gmail.com
...
...
...
<mailto:urotrip2@gmail.com
                    <mailto:urotrip2@gmail.com>>> wrote:
Hi all ,
I was setting a two node ovirt cluster
with
...
ovirt engine on
                         seperate node . I completed the
configuration
...
and tested VM  live
                         migrations with out any issues . Then for
                    checking cluster HA I
                         powered down one host and expected vms
                    running on that host to be
                         migrated to the other one . But nothing
                    happened , Engine detected
                         host as un-rechable and marked it as
                    non-operational and vm ran on
                         that host went to 'unknown state' . Is
is
the
that
...
...
...
...
not possible to setup
                         a fully HA ovirt cluster with two nodes ?
or
...
else is that my
                         configuration problem ? please advice .
Thanks & Regards
Alex

...
...
Users mailing list
                    Users@ovirt.org <mailto:Users@ovirt.org>
                    <mailto:Users@ovirt.org <mailto:
Users@ovirt.org>>
...
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
                    Users mailing list
                    Users@ovirt.org <mailto:Users@ovirt.org>
                    http://lists.ovirt.org/mailman/listinfo/users
--
                Dafna Ron
                _______________________________________________
                Users mailing list
                Users@ovirt.org <mailto:Users@ovirt.org>
                http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users