In the two years that I have been using oVirt, I've been yearning for some nice
architecture primer myself, but I have not been able to find a nice "textbook
style" architecture document.
And it does not help that some of the more in-depth information on the oVirt site,
doesn't seem navigatable from the main "Documentation" link.
This is my very personal opinion, others might have different impressions.
oVirt isn't really a product in the sense that all parts were designed to go together.
Instead it's a package that has been assembled from quite a few rather distinct pieces
of technology that Redhat has aquired over the last decades. What some might view as proof
of extreme flexibility, others will see as lack of integration. The oVirt team is careful
not to re-implement anything on their side, that some other component already delivers.
Unfortunately, that means you better understand those components underneath, their range
of functionality and the tooling around them, because oVirt guys won't explain what
other teams do (e.g. KVM, Gluster, VDO, LVM, Ansible).
KVM originates from Moshe Bar's Qumranet, is the key ingredient of oVirt/RHV but also
leads a somewhat independent life on its own.
Gluster was a separate scale-out storage company that Redhat aquired, which has been
passing through its very own trials and tribulations and suffers from lack of large scale
adoption, especially since scale-out is either cloud or HPC where Gluster seems to hold
little appeal. I think it's stagnating and its level of integration with oVirt is
really minimal, even with the tons of work developers have done. I consider oVirt's
HCI a brilliant value proposition in theory and a half baked implementation that one
certainly should not use "for the entire enterprise".
VDSM is core to oVirt and the philsophical principle AFIK has remained unchanged over the
last ten years. Its approach also isn't exactly novel (but solid!), I see parallels
not only with vSphere but right back to things like the Tivoli workload scheduler, a
mainframe batch scheduler going back decades.
The working principle is to make a deployment plan (~batch schedule), engrave that plan
into persistent shared storage, so that every worker (host) can follow this optimal and
conflict free plan, while the manager is free to rest or die or be rebooted. It relieves
the manager of having to be clustered for availability itsself.
VDSM is the agent on every host (or node) responsible for reading and running that plan
and the engine is the manager, which continously creates the new plans.
Originally the manager ran on a separte physical machine, but then somebody managed to get
that "teleported" into a VM running on the oVirt farm itsself. It's a very
nice feat and mostly just eliminates the need for a separate host to manage the engine. It
also allows for an automatic restart of the management engine on another host, should the
host it ran on fail. But it still needs some special treatment compared to any other VM.
Again this is something VMware and Oracle also managed to do with their VM orchestration
tools, perhaps Nutanix was the first out of the door with that feature.
Perhaps looking at what the other vendors do, sometimes helps to understand how oVirt
works, because they do copy ideas from each other (and may be shy about documenting
that).
There are architecture presentations out there, which unfortunately mostly describe the
implementation changes made over the years, not the fundamental design philosophy nor the
current implementation state. That's mostly because the implementation has changed
fundamentally and keeps changing rapidly so the effort to maintain up-to-date docs seems
too great. E.g. one of the more recent key efforts has been to make Ansible do as much
work as possible, where the original implementation seems to have used scripts.
But that should not keep someone from doing a textbook on the architecture design
principles behind oVirt, and perhaps a condensed overview about the implementation changes
over the years and their motivations.
One could argue that while Redhat is an open source company, it's not an open
knowledge company. It doesn't necessarily publish all available internal documentation
and training material they create for their support engineers. They do want so sell
commercial support.
On the other hand there is conference material, there are lots of RHV related Youtube
videos scattered around, so you can find a lot of information, just not in that tight nice
little book you and I seem to wish for.
Unfortunately, I also have not encountered any useful oVirt/RHV architecture books. The
ones I found were very much "do this, do that" and didn't help me as a
technical architect.
If I thought that oVirt had a bright and bullish future, I'd be tempted to write such
a book myself.
With vSphere struggling aginst clouds, I don't see RHV/oVirt doing the right things to
do better in a similar niche, especially when it also tries to sell OpenShift.
And that goes right back to decisions like rebasing CentOS upstream from RHEL.