That very much describes my own situation two years ago..., just a slight time and
geographic offset as my home is near Frankfurt and my work is in Lyon. I had been doing
70:1 consolidation via virtualization based on OpenVZ (containers, but with a IaaS
abstraction), since 2006 because it was zero hardware and software budget whereas VMware
would have both requried VT-x capable hardware and pricey licenses.
OpenVZ turned out extremely easy to use, super reliable and the sysadmin only had to learn
two or three new commands, not a new abstraction layer. We were running payment front
office systems, where five minutes of downtime mean a very angry boss, one hour of down
time costs a yearly bonus and beyond that you have to gather your things and go.
That also meant planned 100% availability of eventually consistent data, so nothing you
could do on a file level: Oracle with streams replication and programmed healing or
nothing, that meant at the time, can't beat the CAP theorem.
(No this isn't a recommendation to go with OpenVZ: That project is about as lively as
Docker these days).
For a new context with far lighter avaiIability demands but the need to support GPUs for
machine learning (CUDA breaks OpenVZ containers) I thought that oVirt would add managed
heterogeneous VMs and hyperconverged storage to the mix, again at zero entrance fee with a
supported option if that became necessary.
Two years later, I'd say that "I married the wrong woman". It works, but the
claim "oVirt is an open-source distributed virtualization solution, designed to
manage your entire enterprise infrastructure" is terribly misleading.
I had planned to get it reasonably stable and ready within weeks of my typical stress and
failure testing on 1/3 time budget, but it took almost 2 years and more like 50% time
allocation to learn all the very many things that can go wrong, how to diagnose them and
how to fix them.
Just yesterday I had one cluster (my 1st functional test 3 node HCI running on Atoms, that
mostly just sits idle and gets updates, which require reboots), had evidently decided to
lose one gluster network connection and accumulated 5000+ entries in the heal queue during
a week of vacation.
It was four hours of careful digging, hundreds of restarted daemons, various reboots and a
transient situation where the three storage nodes could not see or access the storage they
were providing, while the management engine ran on a compute node and continued to write
to a disk that evidently did not exist... As a newbie I would have either jumped off a
cliff (active users) or tossed the project.
However, the magnificent basic design of oVirt had me recover everything without a loss...
except hair, nerves, general health etc.
And I had to learn the hard way, that just exporting a VM as an OVA and expecting it to be
importable on any other platform advertising OVA support, or even back into oVirt, is not
functionality ever included in any regular QA testing...
...which might explain why it doesn't work. Or perhaps just no longer, after an
update.
In short: do not expect anything to work, that you have not fully tested after every minor
update--several times: everything is extremely raw and ready to break at any moment and I
can't remember the last time I did a plain vanilla install on freshly scrubbed
hardware, where I didn't have to help it along manually and with digging through
dozens of big log files.
What really troubles me: since the basic ingredients for the commercial product are the
same, I don't see how that might save your bacon. Perhaps 100% validated hardware
might do it (for a while), but where oVirt is *designed* for a maximum of flexibility, it
won't reward your taking advantage of that.
I am sticking with it until CentOS7 is end of life, too (just like CentOS8 is already),
because otherwise I'd have nothing to show for two years of work. But if you want to
join in, you need to have serious resources to commit. It's most likely still smaller
than OpenStack, though.
And if you have NetApp filers or SAN, you should not risk HCI. That is super elegant as a
concept, just like Gluster is a beautiful concept, but very soon you'll realize that
they were never designed for each other and remain full of contradictions.
oVirt may be designed to fit that enterprise role, but in the HCI variant, it stil has
nowhere near the cohesion and maturity you'd need for that role. CentOS, LVM, VDO,
KVM, the management engine, Gluster, Ansible are all distinct products from what used to
be different companies.
And it shows.
Of course, that's just my personal experience and opinion.