Some thoughts on enhancing High Availability in oVirt

Tue Feb 21 00:44:29 UTC 2012

On 20/02/12 2:42 AM, Perry Myers wrote:
>>> Absolutely.
>>>
>>> In this case the Cloud Application is the combination of thw two
>>> separate VM components (database VM and AS VM).  A CAPE (cloud
>>> application policy engine) maintains the HA state of both VMs including
>>> correcting for resource (db,as) or vm failures, and ensuring ordering
>>> constraints even during recovery (the AS would start after the DB in
>>> this model).
>>>
>>
>> ok, how would a flow look like to the user (oVirt user)?
>>
>> - Adding new service in OE
>> - Specifying for the service which VMs provide it (?)
>
> That could work, or you could do:
>
> 1. Adding a new VM (or set of VMs in OE)
> 2. Adding one or more services to associate with those VMs
>
> Just depends on what the easier user experience is.  From the
> perspective of pcmk-cloud, we get the same data in the end, which is a
> config file that specifies the resources we care about (both VMs and
> services on those VMs)
>
>> - Specify how the service can be monitored (? how does CAPE knows what
>> to look for as the service heartbeat?)
>
> For each service you would specify whether or not to use:
> * an OCF resource agent (see resources-agents package in Fedora and
>    other distros)
> * A systemd unit or sysV init script
> * Some other custom script (which would need to be either in OCF RA or
>    init script style)
>
>> - Marking th service as HA
>>
>> What's next?
>> Where can the user define the policy about this service
>
> There would need to be UI in OE that exposed an interface for adding
> policy information.  Because the Pacemaker policy engine is very
> flexible, it would make sense to only define very specific knobs in the
> UI, otherwise it could get very confusing for the users.  For more
> complex policies, it might be better to provide a way to manually edit
> the policy file and upload it rather than trying to model everything in
> the UI.

Definitely agree.
You'd want to figure out the use cases and then how to map that to PE 
concepts.  Don't start with the PE concepts and work backwards :-)

>
>> (i.e. 'should be
>> available only on Tuesdays' or 'should be available only between
>> 0800-1700 CET' etc)?
>
> For this example, what do you mean by 'should be available'?  In general
> with HA, the idea is to 'keep the service running as much as possible'.

You can tell the PE that a given resource should only be running during 
certain times though.

> The above example seems less like an HA concern and more of a general
> resource scheduling concern.  I think using the Pacemaker Rules engine
> with pcmk-cloud, this should be possible as well, but I'll let
> Andrew/Steve comment further on that.

It is.  Whether you really want that is a separate question :-)