Redhat's decision to shut down RHV caught Oracle pretty unprepared, I'd guess, who
had just shut down their own vSphere clone in favor of a RHV clone a couple of years ago.
Oracle is even less vocal about their "Oracle Virtualization" strategy, they
don't even seem to have a proper naming convention or branding.
But they have been pushing out OV releases without a publicly announced EOL almost a year
behind Redhat for the last years.
And after a 4.4 release in September 22, a few days ago on December 12th actually a
release 4.5 was made public.
I've operated oVirt 4.3 with significant quality issues for some years and failed to
make oVirt 4.4 work with any degree of acceptable stability but Oracle's variant of
4.4 proved to be rather better than 4.3 on CentOS7 with no noticable bugs, especially in
the Hyperconverged setup that I am using with GlusterFS.
I assumed that this was because Oracle based their 4.4 in fact on RHV 4.4 and not oVirt,
but since they're not telling, who knows?
One issue with 4.4 was that Oracle is pushing their UE-Kernel and that created immediate
issues e.g. with VDO missing modules for UEK and other stuff, but that was solved easily
enough by using the RHEL kernel.
With 4.5 Oracle obviously can't use RHV 4.5 as a base, because there is no such thing
with RHV declared EOL and according to Oracle their 4.5 is based on oVirt 4.5.4, which
made the quality of that release somewhat questionable, but perhaps they have spent the
year that has passed since productively killing bugs... only to be caught by surprise
again, I presume, by an oVirt release 4.5.5 on December 1st, that no one saw coming!
Long story slightly shorter, I've been testing Oracle's 4.5 variant a bit and
it's not without issues.
But much worse, Oracle's variant of oVirt seems to be entirely without any community
that I could find.
Now oVirt has been a somewhat secret society for years, but compared to what's going
on with Oracle this forum is teaming with life!
So did I just not look around enough? Is there a secret lair where all those OV users are
hiding?
Anyhow, here is what I've tested so far and where I'd love to have some feedback:
1. Setting up a three node HCI cluster from scratch using OL8.9 and OV 4.5
Since I don't have extra physical hardware for a 3 node HCI I'm using VMware
workstation 17.5 on a Workstation running Windows 2022, a test platform that has been
working for all kinds of virtualization tests from VMware ESXi, via Xcp-ng and ovirt.
Created three VMs with OL8.9 minimal and then installed OV 4.5. I used the UEK default
kernels and then had an issue when Ansible is trying to create the (local) management
engine: the VM simply could not reach the Oracle repo servers to install the packages
inside the ME. Since that VM is entirely under the control of Ansible and no console
access of any type is possible in that installation phase, I couldn't do diagnostics.
But with 4.4 I used to have similar issues and there switching back to the Redhat kernel
for the ME (and the hosts) resolved them.
But with 4.5 it seems that UEK has become a baked-in dependency: the OV team doesn't
even seem to do any testing with the Redhat kernel any more. Or not with the HCI setup,
which has become deprecated somewhere in oVirt 4.4... Or not with the Cockpit wizard,
which might be in a totally untested state, or....
Doing the same install on OL 8.9 with OV 4.4, however, did work just fine and I was even
able to update to 4.5 afterwards, which was a nice surprise...
...that I could not repeat on my physical test farm using three Atoms. There switching to
the UEK kernel on the hosts caused issues, hosts were becoming unresponsive, file systems
inaccessible, even if they were perfectly fine at the Gluster CLI level and in the end the
ME VM simply would not longer start. Switching back to the Redhat kernel resolved things
there.
In short, switching between the Redhat kernel and UEK, which should be 100% transparent to
all things userland including hypervisors, doesn't work.
But my attempts to go with a clean install of 4.5 on a Redhat kernel or UEK is also facing
issues. So far the only thing that has worked was a single node HCI install using UEK and
OV 4.5 and upgrading to OV 4.5 on a virtualized triple node OV 4.4 HCI cluster.
Anyone else out there trying these things?
I was mostly determined to move to Proxmox VE, but Oracle's OV 4.5 seemed to be
handing a bit of a life-line to oVirt and the base architecture is just much more powerful
(or less manual) than Proxmox, which doesn't have a management engine.