Some thoughts on enhancing High Availability in oVirt

Wed Feb 15 11:12:44 UTC 2012

On 15/02/12 09:03, Ayal Baron wrote:
> 
> 
> ----- Original Message -----
>> On 02/15/2012 01:11 AM, Ayal Baron wrote:
>>>
>>>
>>> ----- Original Message -----
>>>>> I think we first need to look at the larger question of policy
>>>>> engine at
>>>>> ovirt-engine. the two main candidates are pacemaker and drools
>>>>> (jboss
>>>>> rules).
>>>>> pacemaker for having logic in the area.
>>>>> drools for having easier java integration and integrated UI to
>>>>> create
>>>>> policies by users.
>>>>
>>>> Agreed, as I mentioned in my email they're interrelated
>>>
>>> I'm not sure I agree.
>>> This entire thread assumes that the way to do this is to have the
>>> engine continuously monitor all services on all (HA) guests and
>>> according to varying policies reschedule VMs (services within
>>> VMs?)
>>> I don't think this is scalable (and wrt drools/pacemaker, assuming
>>> what Andrew says is correct, drools doesn't even remotely come
>>> close to supporting even relatively small scales)
>>>
>>> Engine should decide on policy, the hosts should enforce it.
>>> What this would translate to is a more distributed way of
>>> monitoring and moving around of VMs/services.  E.g. for each
>>> service, engine would run the VM on host A and let host B know
>>> that it is the failover node for this service.  Node B would be
>>> monitoring the heartbeats for the services it is in charge of and
>>> take over when needed. In case host B crashes, engine would choose
>>> a different host to be the failover node (note that there can be
>>> more than 2 nodes with a predefined order of priority).
>>
>> HA is a simple use case of policy.
> 
> *Today* HA is simply 'if VM is down restart it' but what Perry was suggesting was to improve this to something more robust.

I think that the main concept of what Perry suggested (leaving the
implementation details aside :)) is to add HA of services.

I like this idea and I would like to extend it a little bit.
How about services that are spread on more than a single VM.
I would like to be able to define a service and specify which VM/s
provides this service and add HA flag on the service.

Then i would like to manage policies around it - I define a service
with 3 VMs providing this service and I want to have at least 2 VM
running it at any given time. (now the VMs are not highly available only
the service is.)

> 
>> load balancing/power saving is something more continuous which
>> requires
>> constant global view of workload, could be schedule based, etc.
> 
> power saving is a specific load balancing policy.  Once policy changes (either manually or automatically) then it is engine's job to reshuffle the deck (move VMs around, designate new failover nodes, etc).
> There is no question that the engine should periodically get the state of all the VMs / services it is managing (where it is running etc), but HA decisions need to consider a lot more data and are of finer granularity than general VM placement (health check frequency, intra-vm services monitoring, etc).
> 
>>
>>
>>>
>>>>
>>>> i.e. if you're going to use Pacemaker's policy engine then it
>>>> absolutely
>>>> makes sense to just go with Pacemaker Cloud, since that's
>>>> precisely
>>>> what
>>>> it does (uses the core Pacemaker PE)
>>>>
>>>> OTOH, if you decide to use drools, then it may make more sense to
>>>> integrate the HA concepts directly into the drools PE and then the
>>>> only
>>>> other thing you can leverage would be the library that does the
>>>> monitoring of services at the end points.
>>>> _______________________________________________
>>>> Arch mailing list
>>>> Arch at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/arch
>>>>
>>
>>
> _______________________________________________
> Arch mailing list
> Arch at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/arch