On Thu, May 24, 2018 at 4:28 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, May 24, 2018 at 2:20 PM wodel youchi <wodel.youchi@gmail.com> wrote:


[snip]
 

We are migrating a physical high available application to oVirt.
The HA platform uses pacemaker, it contains two nodes and a shared storage, fence (stonith) is configured to use ILO.

I know that oVirt offers HA for VM, but this HA is not application aware, if a service crashes on the VM, it will not be detected.
Fencing could be achieved by fence agent rhev.

My questions are about the best way to migrate this platform to oVirt.

- Is it a good idea to make both VMs (formally nodes) Highly-Available VMs? or may be pin each one of them to a particular hypervisor and/or use VM-Affinity?

If the VM should be highly available, you should not pin them to any host.
Pinning them will make sure the vm will *not* be available when the host
is down :-)

I think the OP means an alternate way to provide HA, more for a service inside a VM, than for the VM itself.
Eg to target the scenario in which you have one/many services on a VM and you have a situation where only one of them (or only the service but not the OS itself) for some reasons fails.
So the question is about building a virtual-cluster (eg with RHCS cluster sw in former versions of RHEL/CentOS 6 or with Pacemaker in RHEL/CentOS 7) that provides HA for the services that you configure on it.
The 2 (or more) VMs composing the virtual-cluster could have a vnic on the production lan providing services and another vnic for the usual intra-cluster virtual lan if needed by the cluster software

btw: vSphere targeted similar kind of need with the App HA feature in 5.5:
http://www.virtualizationsoftware.com/vsphere-55-application-ha-advanced-application-monitoring/

I think it had not great success and I don't know if it is present in 6.x any more

 

So you probably want to use HA VM - with a VM lease.

This is an alternate / different approach. 

 
- I am thinking about the situation where the hypervisor containing one of the VMs crashes, what will be the behavior of the the fence agent on the application?

Not sure what do you mean by "crashes".

Panic or loose power as you made the example for HA VM

 
The guest agent on the VM will not be able to do anything since it is not running :-)

It means that a cluster fence agent on the surviving VM will take care of monitoring the state of the other virtual-node of the cluster and take actions accordingly
In case of RHEL/CentOS we are talking about fence-agents-rhevm rpm package.

DESCRIPTION
       fence_rhevm  is  an I/O Fencing agent which can be used with RHEV-M REST API to fence virtual machines.


 
   - if the crashed VM is not HA, it will not start on another hypervisor, so the fence agent will try to fence a VM that does not exist anymore, and it will stuck.

In case of not-HA VM1 crash, and no affinity defined for it, it can start anywhere, any host, so in case of problems on host where it was running before, this will not be a problem. 
So the rhevm fence agent on VM2 is able to get the VM1 status from the oVirt Engine. And after a certain timeout should be: down (using the "mandatory" power mgmt features of oVirt defined on the crashed host). 
As soon as it gets the down status it can failover the service on itself and power on again VM1

  
   - if the crashed VM is HA, it will be started on another hypervisor, but what will happen with the fence agent? I think that one VM will fence the other one, and the application will still be unreachable for a longer period.

I wuold not configure a virtual cluster on HA-VM, because actions from virtual-cluster could interfere with actions from oVirt
  

Not clear what fence agent are you talking about.

See above: fence_rhevm 



 
- What about the shared storage, we will use a shared disk on oVirt which does not support snapshot

What is the question?


In the past I had to configure a virtual CentOS 6 cluster because I needed to replicate a problem I had in a physical production cluster and to verify if some actions/updates would have solved the problem.
I had no more spare hw to configure an so using the poor-man method (dd + reconfigure) I had the cluster up and running with two twin nodes identical to the physical ones.
I also opened this bugzilla to backport the el7 package to el6:
https://bugzilla.redhat.com/show_bug.cgi?id=1446474

The intracluster network has been put on OVN btw

But honestly I doubt I will use a virtual-cluster software stack to provide high availability to production services inside a VM. Too many inter-relations


 

- What are the things to avoid?

I think in general, don't try to have two mechanisms that try to do the same.

Either use your HA solution or oVirt HA solution, but not both in the same time.

Nir


Agreed

HIH,
Gianluca