Thanks !
On Tue, Jan 28, 2014 at 2:04 PM, Eli Mesika <emesika(a)redhat.com> wrote:
----- Original Message -----
> From: "Jaison peter" <urotrip2(a)gmail.com>
> To: "Eli Mesika" <emesika(a)redhat.com>
> Cc: users(a)ovirt.org, "Tareq Alayan" <talayan(a)redhat.com>
> Sent: Tuesday, January 28, 2014 7:33:35 AM
> Subject: Re: [Users] two node ovirt cluster with HA
>
> Thank you all for your valuable feedback .
>
> Can you please specify some of the supported fencing devices in ovirt ?
For oVirt 3.4 :
apc,apc_snmp,bladecenter,cisco_ucs,drac5,drac7,eps,hpblade,ilo,ilo2,ilo3,ilo4,ipmilan,rsa,rsb,wti
>
>
> On Mon, Jan 27, 2014 at 9:10 PM, Eli Mesika <emesika(a)redhat.com> wrote:
>
> >
> >
> > ----- Original Message -----
> > > From: "Tareq Alayan" <talayan(a)redhat.com>
> > > To: "Andrew Lau" <andrew(a)andrewklau.com>, "Eli
Mesika" <
> > emesika(a)redhat.com>
> > > Cc: dron(a)redhat.com, "Karli Sjöberg"
<Karli.Sjoberg(a)slu.se>,
> > users(a)ovirt.org
> > > Sent: Monday, January 27, 2014 2:59:02 PM
> > > Subject: Re: [Users] two node ovirt cluster with HA
> > >
> > > Adding Eli.
> >
> > I just want to summarize the requirement as I understand it:
> >
> > In the case that a Host that is running HA VMs and have PM configured
is
> > turned off manually :
> >
> > 1) The non-responsive treatment should be modified to check Host status
> > via PM agent
> > 2) If Host is off , HA VMs will attempt to run on another host ASAP
> > 3) The host status should be set to DOWN
> > 4) No attempt to restart vdsm (soft fencing) or restart the host (hard
> > fencing) will be done
> >
> > Is the above correct? if so , a RFE on that can be opened
> >
> > >
> > >
> > > On 01/27/2014 02:50 PM, Andrew Lau wrote:
> > > > Hi,
> > > >
> > > > I think he was asking what if the power management device reported
> > > > that the host was powered off. Then VMs should be brought back up
as
> > > > being off would essentially be the same as running a power
> > cycle/reboot?
> > > >
> > > > Another example I'm seeing is what happens if the whole host
loses
> > > > power and it's power management device then becomes unavailable
(ie.
> > > > not reachable) then you're stuck in the case where it requires
manual
> > > > intervention.
> > > >
> > > > I would be interested to potentially see something like a timeout
on
> > > > those problematic VMs (eg. if nothing was read or write after x
amount
> > > > of time) then you could consider the host as offline? I guess then
> > > > that adds a lot of risk..
> > > >
> > > >
> > > > On Mon, Jan 27, 2014 at 11:43 PM, Tareq Alayan
<talayan(a)redhat.com
> > > > <mailto:talayan@redhat.com>> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Power management makes use of special *dedicated* hardware in
> > > > order to restart hosts independently of host OS. The engine
> > > > connects to a power management devices using a *dedicated*
network
> > > > IP address.
> > > > The engine is capable of rebooting hosts that have entered a
> > > > non-operational or non-responsive state,
> > > > The abilities provided by all power management devices are:
check
> > > > status, start, stop and recycle (restart)...
> > > >
> > > > In the case of non-responsive host: all of the VMs that are
> > > > currently running on that host can also become non-responsive.
> > > > However, the non-responsive host keeps locking the VM hard disk
> > > > for all VMs it is running. Attempting to start a VM on a
different
> > > > host and assign the second host write privileges for the
virtual
> > > > machine hard disk image can cause data corruption.
> > > > Rebooting allows the engine to assume that the lock on a VM
hard
> > > > disk image has been released.
> > > > The engine can know for sure that the problematic host has been
> > > > rebooted via the power management device and then it can start
a
> > > > VM from the problematic host on another host without risking
data
> > > > corruption.
> > > > Important note: A virtual machine that has been marked
> > > > highly-available can not be safely started on a different host
> > > > without the certainty that doing so will not cause data
corruption.
> > > >
> > > > N-joy,
> > > >
> > > > --Tareq
> > > >
> > > >
> > > >
> > > >
> > > > On 01/27/2014 02:05 PM, Dafna Ron wrote:
> > > >
> > > > I am adding Tareq for the Power Management implementation.
> > > >
> > > > Dafna
> > > >
> > > >
> > > > On 01/27/2014 11:48 AM, Karli Sjöberg wrote:
> > > >
> > > > On Mon, 2014-01-27 at 11:11 +0000, Dafna Ron wrote:
> > > >
> > > > Powering off the host will never trigger vm
migration.
> > > > As far as engine is concerned it just lost
connection
> > > > to the host, but
> > > > has no way of telling if the host is down or if a
> > > > router is down.
> > > >
> > > > Can´t it at least check with power management if the
Host
> > > > status is down
> > > > first?
> > > >
> > > > I mean, if the network is down there will be no
response
> > > > from either PM
> > > > or Host. But if PM is up and can tell you that the
Host is
> > > > down, sounds
> > > > rather clear cut to me...
> > > >
> > > > Seems to me the VM's would be restarted sooner if
the
flow
> > > > was altered
> > > > to first check with PM if it´s a network or Host issue,
> > > > and if Host
> > > > issue, immediately restart VM's on another Host,
instead
> > > > of waiting for
> > > > a potentially problematic Host to boot up eventually.
> > > >
> > > > /K
> > > >
> > > > since vm's can continue running on the host even
if
> > > > engine has no access
> > > > to it, starting the vm's on the second host can
cause
> > > > split brain and
> > > > data corruption.
> > > >
> > > > The way that the engine knows what's going on is
by
> > > > sending heath check
> > > > queries to the vdsm.
> > > > Power management will try to reboot a host when the
> > > > health checks to
> > > > vdsm will not be answered.
> > > > So... if engine gets no reply and has no way of
> > > > rebooting the host, the
> > > > host status will be changed to Non-Responsive and
the
> > > > vm's will be
> > > > unknown because engine has no way of knowing
what's
> > > > happening with the
> > > > vm's.
> > > > Since reboot of the host will kill the vm's
running on
> > > > it - this will
> > > > never cause any vm migration but... along with the
> > > > High-Availability vm
> > > > feature, you will be able to have some of the
vm's
> > > > re-started on the
> > > > second host after the host reboot (and that is
only if
> > > > Power Management
> > > > was confirmed as successful).
> > > >
> > > > VM migration is only triggered when:
> > > > 1. Cluster configuration states that the vm should
be
> > > > migrated in case
> > > > of failure
> > > > 2. Engine has access to the host - so the failure
is
> > > > on the storage side
> > > > and not the host side.
> > > > 3. the vms are not actively writing (although there
> > > > might be a new RFE
> > > > for it).
> > > >
> > > > hope this clears things up
> > > >
> > > > Dafna
> > > >
> > > >
> > > >
> > > > On 01/27/2014 10:11 AM, Andrew Lau wrote:
> > > >
> > > > Hi,
> > > >
> > > > Have you got power management enabled?
> > > >
> > > > That's the fencing feature required for the
engine
> > > > to ensure that the
> > > > host is actually offline. It won't resume
any
> > > > other VMs to prevent
> > > > potential VM corruption (eg. VM running on
> > > > multiple hosts).
> > > >
> > > > Andrew.
> > > >
> > > > On Jan 27, 2014 5:12 PM, "Jaison
peter"
> > > > <urotrip2(a)gmail.com
<mailto:urotrip2@gmail.com
>
> > > > <mailto:urotrip2@gmail.com
> > > > <mailto:urotrip2@gmail.com>>> wrote:
> > > >
> > > > Hi all ,
> > > >
> > > > I was setting a two node ovirt cluster
with
> > > > ovirt engine on
> > > > seperate node . I completed the
configuration
> > > > and tested VM live
> > > > migrations with out any issues . Then for
> > > > checking cluster HA I
> > > > powered down one host and expected vms
> > > > running on that host to be
> > > > migrated to the other one . But nothing
> > > > happened , Engine detected
> > > > host as un-rechable and marked it as
> > > > non-operational and vm ran on
> > > > that host went to 'unknown state' .
Is
that
> > > > not possible to setup
> > > > a fully HA ovirt cluster with two nodes ?
or
> > > > else is that my
> > > > configuration problem ? please advice .
> > > >
> > > > Thanks & Regards
> > > >
> > > > Alex
> > > >
> > > >
> > _______________________________________________
> > > > Users mailing list
> > > > Users(a)ovirt.org <mailto:Users@ovirt.org>
> > > > <mailto:Users@ovirt.org <mailto:
Users(a)ovirt.org>>
> > > >
http://lists.ovirt.org/mailman/listinfo/users
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Users mailing list
> > > > Users(a)ovirt.org <mailto:Users@ovirt.org>
> > > >
http://lists.ovirt.org/mailman/listinfo/users
> > > >
> > > >
> > > > --
> > > > Dafna Ron
> > > > _______________________________________________
> > > > Users mailing list
> > > > Users(a)ovirt.org <mailto:Users@ovirt.org>
> > > >
http://lists.ovirt.org/mailman/listinfo/users
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > _______________________________________________
> > Users mailing list
> > Users(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/users
> >
>