On Thu, May 24, 2018 at 4:28 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
On Thu, May 24, 2018 at 2:20 PM wodel youchi
<wodel.youchi(a)gmail.com>
wrote:
>
>
[snip]
> We are migrating a physical high available application to oVirt.
> The HA platform uses pacemaker, it contains two nodes and a shared
> storage, fence (stonith) is configured to use ILO.
>
> I know that oVirt offers HA for VM, but this HA is not application aware,
> if a service crashes on the VM, it will not be detected.
> Fencing could be achieved by fence agent rhev.
>
> My questions are about the best way to migrate this platform to oVirt.
>
> - Is it a good idea to make both VMs (formally nodes) Highly-Available
> VMs? or may be pin each one of them to a particular hypervisor and/or use
> VM-Affinity?
>
If the VM should be highly available, you should not pin them to any host.
Pinning them will make sure the vm will *not* be available when the host
is down :-)
I think the OP means an alternate way to provide HA, more for a service
inside a VM, than for the VM itself.
Eg to target the scenario in which you have one/many services on a VM and
you have a situation where only one of them (or only the service but not
the OS itself) for some reasons fails.
So the question is about building a virtual-cluster (eg with RHCS cluster
sw in former versions of RHEL/CentOS 6 or with Pacemaker in RHEL/CentOS 7)
that provides HA for the services that you configure on it.
The 2 (or more) VMs composing the virtual-cluster could have a vnic on the
production lan providing services and another vnic for the usual
intra-cluster virtual lan if needed by the cluster software
btw: vSphere targeted similar kind of need with the App HA feature in 5.5:
http://www.virtualizationsoftware.com/vsphere-55-application-ha-advanced-
application-monitoring/
I think it had not great success and I don't know if it is present in 6.x
any more
So you probably want to use HA VM - with a VM lease.
This is an alternate / different approach.
> - I am thinking about the situation where the hypervisor containing one
> of the VMs crashes, what will be the behavior of the the fence agent on the
> application?
>
Not sure what do you mean by "crashes".
Panic or loose power as you made the example for HA VM
The guest agent on the VM will not be able to do anything since it is
not
running :-)
It means that a cluster fence agent on the surviving VM will take care of
monitoring the state of the other virtual-node of the cluster and take
actions accordingly
In case of RHEL/CentOS we are talking about fence-agents-rhevm rpm package.
DESCRIPTION
fence_rhevm is an I/O Fencing agent which can be used with RHEV-M
REST API to fence virtual machines.
> - if the crashed VM is not HA, it will not start on another
> hypervisor, so the fence agent will try to fence a VM that does not exist
> anymore, and it will stuck.
>
In case of not-HA VM1 crash, and no affinity defined for it, it can start
anywhere, any host, so in case of problems on host where it was running
before, this will not be a problem.
So the rhevm fence agent on VM2 is able to get the VM1 status from the
oVirt Engine. And after a certain timeout should be: down (using the
"mandatory" power mgmt features of oVirt defined on the crashed host).
As soon as it gets the down status it can failover the service on itself
and power on again VM1
- if the crashed VM is HA, it will be started on another
hypervisor,
> but what will happen with the fence agent? I think that one VM will fence
> the other one, and the application will still be unreachable for a longer
> period.
>
I wuold not configure a virtual cluster on HA-VM, because actions from
virtual-cluster could interfere with actions from oVirt
Not clear what fence agent are you talking about.
See above: fence_rhevm
> - What about the shared storage, we will use a shared disk on oVirt which
> does not support snapshot
>
What is the question?
In the past I had to configure a virtual CentOS 6 cluster because I needed
to replicate a problem I had in a physical production cluster and to verify
if some actions/updates would have solved the problem.
I had no more spare hw to configure an so using the poor-man method (dd +
reconfigure) I had the cluster up and running with two twin nodes identical
to the physical ones.
I also opened this bugzilla to backport the el7 package to el6:
https://bugzilla.redhat.com/show_bug.cgi?id=1446474
The intracluster network has been put on OVN btw
But honestly I doubt I will use a virtual-cluster software stack to provide
high availability to production services inside a VM. Too many
inter-relations
>
> - What are the things to avoid?
>
I think in general, don't try to have two mechanisms that try to do the
same.
Either use your HA solution or oVirt HA solution, but not both in the
same
time.
Nir
Agreed
HIH,
Gianluca