As one of my priorities on a virtualization platform is to offer HA,
I
wanted to know how does it work on the ovirt architecture. I mean, I my
management node fails, is HA still running on the ovirt-nodes (is
distributed ) or is it manager dependent?
Right now if the oVirt Engine server fails, HA of the guests running on
oVirt Nodes will not work. This is because the oVirt Engine is what
coordinates monitoring and restart of the guests marked as HA.
Today, the best way to protect against that double-failure is to provide
HA for the oVirt Engine itself. This can be done by setting up a 2 node
HA cluster via a HA stack like Pacemaker or RHEL Clustering. Pacemaker
is in lots of distributions, so this is a fairly ubiquitous way of
providing HA for non-HA aware services.
In the future, the goal is to make the oVirt Engine HA aware via
something similar to JBoss clustering combined with database
replication/clustering. This will remove the need for a separate HA stack.
Also, my understanding is that the roadmap for vdsm is to provide it
with more intelligence/policies so that it can take care of some of the
HA features even in the absence of the oVirt Engine running.
The enhanced vdsm for policy/HA is a roadmap item, as is making the
Engine HA aware. We could certainly use help implementing those items :)