On Mon, Feb 7, 2022 at 3:04 PM Sandro Bonazzola
<sbonazzo(a)redhat.com> wrote:
The oVirt storage team never worked on HCI and we don't plan to work on
it in the future. HCI was designed and maintained by Gluster folks. Our
contribution for HCI was adding 4k support, enabling usage of VDO.
Improving on the HCI side is unlikely to come from Red Hat, but nothing
blocks other companies or contributors from working on this.
Our focus for 4.5 is Managed Block Storage and incremental backup.
Nir
Hi Nir,
thank you for the clear message and the confirmation of a suspicion that has been growing
for a while: HCI is an unwanted stepchild, not an equally supported option or even a
strategic direction.
And it's perhaps not the only one: in the mean-time I can see VDO and the GUIs getting
neglected, too.
My problem (and that of potentially many others) is that this unbalanced attention
wasn't communicated or made visible. HCI may have disappeared from the oVirt front
page today, in fact it's hard to find these days, but it was very prominent on 4.3
when I started.
And as a core developer you may not realize this, but the primary first exposure to oVirt
by many (most?) newcomers isn't the command line. It's the Cockpit wizard, where
there are two HCI choices next to the one SAN/NAS option where evidently 95% of the oVirt
teams work went, which doesn't use Gluster, HCI, VDO or any of the GUIs.
I was extremely naive to believe you'd only need to click buttons in GUIs to run a 3
node HCI VDO cluster, but actually that expectation came from the presentation on this
site.
And personally I believe that if that had worked, RHV-HCI would be thriving.
But as it turned out, both the setup GUI and the operational GUI rarely ever worked, very
likely because both were done by yet another team, while you guys only worked and tested
at the Ansible level and with SAN/NAS storage.
You will say that oVirt is a community project.
I will say that if you advertise oVirt as "designed to manage your entire enterprise
infrastructure" and put buttons in GUIs, people will take that at face value and
expect them to work.
I don't know how many oVirt-HCI deployments I did over the years, but for every
release I've tried since 4.3.5 or so with the Cockpit HCI wizard, none has ever just
worked. I've had to dig through logfiles all over the place to fix things like
blacklisted storage, when I was using Gluster. Then there were VDO options that
weren't supported yet on EL7 in 4.3 ansible scripts, VDO disappearing altogether after
a kernel upgrade on EL8, Python 2/3 issues and I don't know how many other problems,
just to get things set up.
I just did a full fresh set of setups with oVirt 4.4.10 when I was testing the
compatibility of the various downstream EL8 derivatives and it's still the same: the
Cockpit setup HCI wizard never just works. By now I know where to fiddle, but it saps
confidence in the product when every release fails the basic setup.
I can hear you saying "our CI only tests at script level", can you guess why
that has an impact on quality?
And it was the same for nearly half of the operations in the oVirt GUI. I went through
each and every one of them and for startes I often couldn't find out what they were
supposed to do, while some even looked downright dangerous to click (e.g. "reset
brick").
Export and import OVA were pure nightmares, because it turned out that the exported
machines might in fact contain 100GB of zeros instead of the disk image. Even once that
was fixed, interoperability with other hypervisors (that's the purpose of OVA), was
zero.
As explanation I was told here that OVA in-/export wasn't really "meant to be
used", much like HCI I guess.
There is a cluster upgrade button, but I think I only ever hit it once, only to notice
that it just created more damage and didn't add convenience. In fact upgrading any
node became an entirely manual job in the end, because it never worked. The gluster daemon
never started properly after a reboot and resulted in ovirt-ha-broker, ovirt-ha-agent and
vdsmd sulking, requiring carefully timed restarts to get going again.
And then an upgrade procedure for a high-availability HCI from EL7/oVirt 4.3 to EL8/oVirt
4.4 that had 40 steps or so, none of which were allowed to fail and with no obvious
failback just isn't "enterprise".
To my eyes the value proposition of oVirt was "instant on-premise fault tolerant
cloud", something I could then use to run VMs or indeed OpenShift on.
oVirt never delivered in an enterprise quality and I can't see it getting any closer
without a downstream product.
Even a community needs a concrete vision, or perhaps at least a few real use cases.