Some thoughts on enhancing High Availability in oVirt

Sun Feb 19 20:55:40 UTC 2012

On 19/02/12 17:42, Perry Myers wrote:
>>> Absolutely.
>>>
>>> In this case the Cloud Application is the combination of thw two
>>> separate VM components (database VM and AS VM).  A CAPE (cloud
>>> application policy engine) maintains the HA state of both VMs including
>>> correcting for resource (db,as) or vm failures, and ensuring ordering
>>> constraints even during recovery (the AS would start after the DB in
>>> this model).
>>>
>>
>> ok, how would a flow look like to the user (oVirt user)?
>>
>> - Adding new service in OE
>> - Specifying for the service which VMs provide it (?)
> 
> That could work, or you could do:
> 
> 1. Adding a new VM (or set of VMs in OE)
> 2. Adding one or more services to associate with those VMs
> 
> Just depends on what the easier user experience is.  From the
> perspective of pcmk-cloud, we get the same data in the end, which is a
> config file that specifies the resources we care about (both VMs and
> services on those VMs)
> 
>> - Specify how the service can be monitored (? how does CAPE knows what
>> to look for as the service heartbeat?)
> 
> For each service you would specify whether or not to use:
> * an OCF resource agent (see resources-agents package in Fedora and
>   other distros)
> * A systemd unit or sysV init script
> * Some other custom script (which would need to be either in OCF RA or
>   init script style)
> 
>> - Marking th service as HA
>>
>> What's next?
>> Where can the user define the policy about this service
> 
> There would need to be UI in OE that exposed an interface for adding
> policy information.  Because the Pacemaker policy engine is very
> flexible, it would make sense to only define very specific knobs in the
> UI, otherwise it could get very confusing for the users.  For more
> complex policies, it might be better to provide a way to manually edit
> the policy file and upload it rather than trying to model everything in
> the UI.
> 
>> (i.e. 'should be
>> available only on Tuesdays' or 'should be available only between
>> 0800-1700 CET' etc)?
> 
> For this example, what do you mean by 'should be available'?  In general
> with HA, the idea is to 'keep the service running as much as possible'.
> 

You are right, I mixed two use cases.
Let's focus on HA for start.

Let say CAPE found VM/service is down, does it initiate runVM by OE API?
Who chooses on which host to start the VM and who is responsible for
doing setup work in case it is required by the VM? for example if a VM
is using direct LUN then we might need to connect the host to that LUN
before starting the VM on the target host.

If CAPE use OE to start the VM the setup will be taken-care-of by OE as
part of starting the VM.

> The above example seems less like an HA concern and more of a general
> resource scheduling concern.  I think using the Pacemaker Rules engine
> with pcmk-cloud, this should be possible as well, but I'll let
> Andrew/Steve comment further on that.
> 
> Perry