Some thoughts on enhancing High Availability in oVirt
Ayal Baron
abaron at redhat.com
Wed Feb 15 06:32:31 UTC 2012
----- Original Message -----
> > I'm not sure I agree.
> > This entire thread assumes that the way to do this is to have the
> > engine continuously monitor all services on all (HA) guests and
> > according to varying policies reschedule VMs (services within VMs?)
>
> That's one interpretation of what I wrote, but not the only one.
>
> Pacemaker Cloud doesn't rely on a single process (like oVirt Engine)
> to
> monitor all VMs and the services in those VMs. It relies on spawning
> a
> monitor process for each logical grouping of VMs in an 'application
> group'.
>
> So the engine doesn't need to continuously monitor every VM and every
> service, it delegates to the Cloud Policy Engine (CPE) which in turn
> creates a daemon (DPE[1]) to monitor each application group.
Where is the daemon spawn? on the engine or in a distributed fashion? if the latter then drools is irrelevant. if the former then it would just make things worse (scalability wise)
>
> > I don't think this is scalable (and wrt drools/pacemaker, assuming
> > what Andrew says is correct, drools doesn't even remotely come
> > close
> > to supporting even relatively small scales)
>
> The way to deal with drools poor scaling is... don't use drools :)
>
> But you're right, having oVirt Engine be the sole entity for
> monitoring
> every service on every VM is not scalable, which is the reason why
> the
> Pacemaker Cloud architecture doesn't do it that way.
>
> > Engine should decide on policy, the hosts should enforce it.
>
> This is how Pacemaker Cloud works as well, except right now I'd
> restate
> it as: Engine should decide on policy and the DPEs should enforce it.
>
> In the current thinking the DPEs run co-located with the CPE, which
> would run nearby (but not necessarily on the same server as) the
> oVirt
> Engine.
>
> However, you bring up a good point in that the DPEs could be
> distributed
> to the hosts. (Right now CPE/DPE communication uses IPC but this
> could
> be replaced with something TCP oriented)
>
> Note: Not relying on anything from the host was a design constraint
> for
> Pacemaker Cloud. oVirt is different in that you can put things on
> the
> hosts, so there may be optimizations we can make due to this relaxed
> constraint, like putting the DPEs onto the Hosts.
>
> > What this would translate to is a more distributed way of
> > monitoring
> > and moving around of VMs/services. E.g. for each service, engine
> > would run the VM on host A and let host B know that it is the
> > failover node for this service.
>
> That seems restrictive. Why not allow that VM to fail over to 'any
> other node in the cloud' vs. picking a specific piece of hardware?
> If
> you allow it to just pick the best available node at the time using
> predefined policies that will result in less focus on the individual
> hosts and make things more cloud-like (abstraction of resources)
>
> > Node B would be monitoring the
> > heartbeats for the services it is in charge of and take over when
> > needed. In case host B crashes, engine would choose a different
> > host
> > to be the failover node (note that there can be more than 2 nodes
> > with a predefined order of priority).
>
> Agree with this... Sort of what I said above, the DPE could run on
> HostB
> to monitor stuff running on Hosts A and C (for case where there are
> multiple VMs across different hosts in an application group). And if
> the DPE or HostB fails, then the CPE would respawn a new DPE on a new
> host.
>
> I think Pacemaker Cloud could fit the paradigm you're looking for
> here.
> But it will require a little integration work. On the other hand,
> if
> you are looking to keep this more Java oriented or very tightly
> integrated with the oVirt codebase, then you could probably take
> similar
> concepts as what has already been done in pcmk-cloud and re-implement
> them.
>
> Either way works. We'd be happy to assist either with integration of
> pcmk-cloud here or with general advice on HA as you work on the Java
> implementation.
>
> Perry
>
>
> [1] This daemon right now is called the DPE for Deployable Policy
> Engine, since in the Aeolus terminology a Deployable was a set of
> VMs that were coordinated to run an application. For example, 2
> VMs, one running a database and the other running a web server.
>
> Aeolus terminology has changed and 'Deployable' is no longer used
> to describe this. Instead this is called an Application
> Set/Group
>
> Because pcmk-cloud adopted Aeolus terminology and the Deployable
> term is not really well known, we're probably going to rename the
> DPE to be "Cloud Application Policy Engine" or CAPE.
>
More information about the Arch
mailing list