On 11/04/2011 05:46 PM, José Román Bilbao wrote:
Thanks for the info. I am committed to help on this, but I don't
know
where to start... Can you point me on the right direction?
Depends on what your short term priority is. Could be any one of the
following areas:
* Helping to write an upstream whitepaper/wiki on using standard
clustering solutions to make oVirt Engine HA. Brandon Perkins from Red
Hat wrote something up specific to RHEL, perhaps you could help him with
a POC using Pacemaker in a distro like Ubuntu or SLES. (Brandon said
he'd handle the writeup of Fedora HA + oVirt Engine)
* Helping the oVirt Engine team with leveraging app server clustering to
make the engine HA aware. I don't know the details here, so your best
path is to join the engine-devel mailing list and start submitting small
patches on the engine in order to better learn the codebase so that you
can then help with larger features like this
* Helping the vdsm team with implementing node policy for moving some HA
features to the endpoints instead of needing the engine. Joining
vdsm-devel(a)lists.fedorahosted.org and starting to learn the codebase
there by submitting some small patches to start with is a good idea
Basically, start small. Push some patches upstream to show that you
have a good understanding of the codebase, and then start writing up
proposals for the upstream list that outline how you think best to
tackle some of these more advanced features. Certainly, implementing
engine HA or vdsm policy is not a trivial exercise, so it will involve
input and development from many people.
Hope that helps
Perry
Thanks,
Jode
El 04/11/2011 17:11, "Perry Myers" <pmyers(a)redhat.com
<mailto:pmyers@redhat.com>> escribió:
> As one of my priorities on a virtualization platform is to offer HA, I
> wanted to know how does it work on the ovirt architecture. I mean,
I my
> management node fails, is HA still running on the ovirt-nodes (is
> distributed ) or is it manager dependent?
Right now if the oVirt Engine server fails, HA of the guests running on
oVirt Nodes will not work. This is because the oVirt Engine is what
coordinates monitoring and restart of the guests marked as HA.
Today, the best way to protect against that double-failure is to provide
HA for the oVirt Engine itself. This can be done by setting up a 2 node
HA cluster via a HA stack like Pacemaker or RHEL Clustering. Pacemaker
is in lots of distributions, so this is a fairly ubiquitous way of
providing HA for non-HA aware services.
In the future, the goal is to make the oVirt Engine HA aware via
something similar to JBoss clustering combined with database
replication/clustering. This will remove the need for a separate HA
stack.
Also, my understanding is that the roadmap for vdsm is to provide it
with more intelligence/policies so that it can take care of some of the
HA features even in the absence of the oVirt Engine running.
The enhanced vdsm for policy/HA is a roadmap item, as is making the
Engine HA aware. We could certainly use help implementing those
items :)