Some thoughts on enhancing High Availability in oVirt

Ayal Baron abaron at redhat.com
Thu Feb 16 16:14:57 UTC 2012



----- Original Message -----
> On 02/14/2012 11:32 PM, Ayal Baron wrote:
> > 
> > 
> > ----- Original Message -----
> >>> I'm not sure I agree.
> >>> This entire thread assumes that the way to do this is to have the
> >>> engine continuously monitor all services on all (HA) guests and
> >>> according to varying policies reschedule VMs (services within
> >>> VMs?)
> >>
> >> That's one interpretation of what I wrote, but not the only one.
> >>
> >> Pacemaker Cloud doesn't rely on a single process (like oVirt
> >> Engine)
> >> to
> >> monitor all VMs and the services in those VMs.  It relies on
> >> spawning
> >> a
> >> monitor process for each logical grouping of VMs in an
> >> 'application
> >> group'.
> >>
> >> So the engine doesn't need to continuously monitor every VM and
> >> every
> >> service, it delegates to the Cloud Policy Engine (CPE) which in
> >> turn
> >> creates a daemon (DPE[1]) to monitor each application group.
> > 
> > Where is the daemon spawn? on the engine or in a distributed
> > fashion? if the latter then drools is irrelevant.  if the former
> > then it would just make things worse (scalability wise)
> > 
> 
> Ayal,
> 
> CPE (cloud policy engine - responsible for starting/stopping cloud
> application policy engines, provides an API for third party control)
> runs on the same machines as the CAPE(aka DPE) (cloud application
> policy
> engine - responsible for maintaining the availability of the
> resources
> and virtual machines in one cloud application - including recovery
> escalation, ordering constraints, fault detection, fault isolation,
> instantiation of vms).  This collection of software components could
> be
> collocated with the engine, or a separate machine entirely since the
> project provides an API to third party projects.
> 
> One thing that may not be entirely clear is that there is a new DPE
> process for each cloud application (which could be monitor several
> hundreds VMs for large applications).  This converts the inherent
> inability of any policy engine to scale to large object counts into a
> kernel scheduling problem and memory consumption problem (kernel.org
> scheduler rocks, memory is cheap).
> 
> The CAPE processes could be spawned in a distributed fashion very
> trivially, if/when we run into scaling problems with a single node.
>  No
> sense optimizing for a condition that may not be relevant.
> 
> One intentional aspect of our project is focused around reliability.
> Our CAPE process is approximately 2kloc.  Its very small code
> footprint
> is designed to be easy to "get right" vs a huge monolithic code base
> which increases the possible failure scenarios.
> 
> As a short note about scalability, my laptop can run 1000 CAPE
> processes
> with 1% total cpu utilization (measured with top) and 5gig memory
> utilization (measured with free).  The design's upper limit on scale
> is
> based upon a) limitations of kernel scheduling b) memory consumption
> of
> the CAPE process.

But they all schedule the services to run on the same set of resources (hosts / memory / cpu), how do you coordinate?

> 
> Regards
> -steve
> 



More information about the Arch mailing list