Some thoughts on enhancing High Availability in oVirt

Tue Feb 21 16:27:58 UTC 2012

On 02/21/2012 06:09 AM, Livnat Peer wrote:
> On 21/02/12 03:34, Steven Dake wrote:
>> On 02/19/2012 01:55 PM, Livnat Peer wrote:
>>> On 19/02/12 17:42, Perry Myers wrote:
>>>>>> Absolutely.
>>>>>>
>>>>>> In this case the Cloud Application is the combination of thw two
>>>>>> separate VM components (database VM and AS VM).  A CAPE (cloud
>>>>>> application policy engine) maintains the HA state of both VMs including
>>>>>> correcting for resource (db,as) or vm failures, and ensuring ordering
>>>>>> constraints even during recovery (the AS would start after the DB in
>>>>>> this model).
>>>>>>
>>>>>
>>>>> ok, how would a flow look like to the user (oVirt user)?
>>>>>
>>>>> - Adding new service in OE
>>>>> - Specifying for the service which VMs provide it (?)
>>>>
>>>> That could work, or you could do:
>>>>
>>>> 1. Adding a new VM (or set of VMs in OE)
>>>> 2. Adding one or more services to associate with those VMs
>>>>
>>>> Just depends on what the easier user experience is.  From the
>>>> perspective of pcmk-cloud, we get the same data in the end, which is a
>>>> config file that specifies the resources we care about (both VMs and
>>>> services on those VMs)
>>>>
>>>>> - Specify how the service can be monitored (? how does CAPE knows what
>>>>> to look for as the service heartbeat?)
>>>>
>>>> For each service you would specify whether or not to use:
>>>> * an OCF resource agent (see resources-agents package in Fedora and
>>>>   other distros)
>>>> * A systemd unit or sysV init script
>>>> * Some other custom script (which would need to be either in OCF RA or
>>>>   init script style)
>>>>
>>>>> - Marking th service as HA
>>>>>
>>>>> What's next?
>>>>> Where can the user define the policy about this service
>>>>
>>>> There would need to be UI in OE that exposed an interface for adding
>>>> policy information.  Because the Pacemaker policy engine is very
>>>> flexible, it would make sense to only define very specific knobs in the
>>>> UI, otherwise it could get very confusing for the users.  For more
>>>> complex policies, it might be better to provide a way to manually edit
>>>> the policy file and upload it rather than trying to model everything in
>>>> the UI.
>>>>
>>>>> (i.e. 'should be
>>>>> available only on Tuesdays' or 'should be available only between
>>>>> 0800-1700 CET' etc)?
>>>>
>>>> For this example, what do you mean by 'should be available'?  In general
>>>> with HA, the idea is to 'keep the service running as much as possible'.
>>>>
>>>
>>> You are right, I mixed two use cases.
>>> Let's focus on HA for start.
>>>
>>> Let say CAPE found VM/service is down, does it initiate runVM by OE API?
>>> Who chooses on which host to start the VM and who is responsible for
>>> doing setup work in case it is required by the VM? for example if a VM
>>> is using direct LUN then we might need to connect the host to that LUN
>>> before starting the VM on the target host.
>>>
>>> If CAPE use OE to start the VM the setup will be taken-care-of by OE as
>>> part of starting the VM.
>>>
>>>
>>
>> Currently CAPE uses deltacloud APIs to start/stop instances.
>>
>> The choosing of which host to start the vm is an act of scheduling
>> which, in our model, is in the domain of the IAAS platform,  I expect
>> the typical start operation would look like:
>> 1. cape determines which VMs to start
>> 2. cape sends instance start operations to deltacloudd
>> 3. deltacloudd sends instance start operations to OE API
>> 4. OE starts the vms
>>
>> The model we have been operating under is that setup work of the actual
>> virtual machine image is done prior to launching.
>>
> 
> Few more questions:
> 
> - If the user initiates stop to HA VM does OE has to coordinate that
> with cape? terminate CAPE as well?
> 

There is another process called a CPE (cloud policy engine) which
provides a REST API for start/stop of instances.  This process starts
and stops the CAPE processes as necessary.

> - How does CAPE makes the decision that it is 'safe' to restart the
> resource?

when monitoring fails in some way we terminate the node via deltacloud.

> For example currently if OE looses the VM heart beat but we have the
> host heart beat we know that it is safe to restart the VM. If we loose
> the host heart beat (which implies we loose the VM heart beat as well)
> we do not start the VM until we fence the host (or the user can manually
> approve he rebooted the host).
> 

This particular use case could be handled with a bit of extra code on
our end.  Use case seems reasonable.

> 
> - Currently OE is monitoring the VMs for collecting statistics (CPU,
> memory, network usage etc.) if OE uses CAPE for providing HA of VMs (or
> services) it won't 'save' OE the need to monitor the VM for statistics,
> so if the purpose of this integration is to help with OE scalability
> don't we need to take care of the monitoring of the VM statistics as well?
>

We support multiple transport mechanisms per a separate cape binary.
Please have a look at

http://www.pacemaker-cloud.org/downloads/cape-ovirt.pdf

This shows how ovirt support could be added by pacemaker cloud devs.
Essentially ovirt.o would communicate with current ovirt monitoring
infrastructure via whatever method makes the most sense.  The operations
that trans_ssh.o, or matahari.o or ovirt.o need are vm healthcheck,
reosurce start, stop, monitor.

Regards
-steve

> Livnat
> 
>> Physical resource mapping (such as LUNs or block storage) are again the
>> domain of the IAAS platform.
>>
>> Note we have had some informal requests to also handle scheduling, but
>> would need topology information about the physical resources available
>> in order to make those decisions.  Currently there is no "standardized"
>> way to determine the topology.  We don't tackle this problem (currently)
>> in our implementation.  The project is only focused on HA.
>>
>> Regards
>> -steve
>>
>>>
>>>> The above example seems less like an HA concern and more of a general
>>>> resource scheduling concern.  I think using the Pacemaker Rules engine
>>>> with pcmk-cloud, this should be possible as well, but I'll let
>>>> Andrew/Steve comment further on that.
>>>>
>>>> Perry
>>>
>>
>