Some thoughts on enhancing High Availability in oVirt

Wed Feb 15 07:03:52 UTC 2012

----- Original Message -----
> On 02/15/2012 01:11 AM, Ayal Baron wrote:
> >
> >
> > ----- Original Message -----
> >>> I think we first need to look at the larger question of policy
> >>> engine at
> >>> ovirt-engine. the two main candidates are pacemaker and drools
> >>> (jboss
> >>> rules).
> >>> pacemaker for having logic in the area.
> >>> drools for having easier java integration and integrated UI to
> >>> create
> >>> policies by users.
> >>
> >> Agreed, as I mentioned in my email they're interrelated
> >
> > I'm not sure I agree.
> > This entire thread assumes that the way to do this is to have the
> > engine continuously monitor all services on all (HA) guests and
> > according to varying policies reschedule VMs (services within
> > VMs?)
> > I don't think this is scalable (and wrt drools/pacemaker, assuming
> > what Andrew says is correct, drools doesn't even remotely come
> > close to supporting even relatively small scales)
> >
> > Engine should decide on policy, the hosts should enforce it.
> > What this would translate to is a more distributed way of
> > monitoring and moving around of VMs/services.  E.g. for each
> > service, engine would run the VM on host A and let host B know
> > that it is the failover node for this service.  Node B would be
> > monitoring the heartbeats for the services it is in charge of and
> > take over when needed. In case host B crashes, engine would choose
> > a different host to be the failover node (note that there can be
> > more than 2 nodes with a predefined order of priority).
> 
> HA is a simple use case of policy.

*Today* HA is simply 'if VM is down restart it' but what Perry was suggesting was to improve this to something more robust.

> load balancing/power saving is something more continuous which
> requires
> constant global view of workload, could be schedule based, etc.

power saving is a specific load balancing policy.  Once policy changes (either manually or automatically) then it is engine's job to reshuffle the deck (move VMs around, designate new failover nodes, etc).
There is no question that the engine should periodically get the state of all the VMs / services it is managing (where it is running etc), but HA decisions need to consider a lot more data and are of finer granularity than general VM placement (health check frequency, intra-vm services monitoring, etc).

> 
> 
> >
> >>
> >> i.e. if you're going to use Pacemaker's policy engine then it
> >> absolutely
> >> makes sense to just go with Pacemaker Cloud, since that's
> >> precisely
> >> what
> >> it does (uses the core Pacemaker PE)
> >>
> >> OTOH, if you decide to use drools, then it may make more sense to
> >> integrate the HA concepts directly into the drools PE and then the
> >> only
> >> other thing you can leverage would be the library that does the
> >> monitoring of services at the end points.
> >> _______________________________________________
> >> Arch mailing list
> >> Arch at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/arch
> >>
> 
>