
On Mon, Feb 7, 2022 at 3:04 PM Sandro Bonazzola <sbonazzo(a)redhat.com> wrote:
The oVirt storage team never worked on HCI and we don't plan to work on it in the future. HCI was designed and maintained by Gluster folks. Our contribution for HCI was adding 4k support, enabling usage of VDO.
Improving on the HCI side is unlikely to come from Red Hat, but nothing blocks other companies or contributors from working on this.
Our focus for 4.5 is Managed Block Storage and incremental backup.
Nir
Hi Nir, thank you for the clear message and the confirmation of a suspicion that has been growing for a while: HCI is an unwanted stepchild, not an equally supported option or even a strategic direction. And it's perhaps not the only one: in the mean-time I can see VDO and the GUIs getting neglected, too. My problem (and that of potentially many others) is that this unbalanced attention wasn't communicated or made visible. HCI may have disappeared from the oVirt front page today, in fact it's hard to find these days, but it was very prominent on 4.3 when I started. And as a core developer you may not realize this, but the primary first exposure to oVirt by many (most?) newcomers isn't the command line. It's the Cockpit wizard, where there are two HCI choices next to the one SAN/NAS option where evidently 95% of the oVirt teams work went, which doesn't use Gluster, HCI, VDO or any of the GUIs. I was extremely naive to believe you'd only need to click buttons in GUIs to run a 3 node HCI VDO cluster, but actually that expectation came from the presentation on this site. And personally I believe that if that had worked, RHV-HCI would be thriving. But as it turned out, both the setup GUI and the operational GUI rarely ever worked, very likely because both were done by yet another team, while you guys only worked and tested at the Ansible level and with SAN/NAS storage. You will say that oVirt is a community project. I will say that if you advertise oVirt as "designed to manage your entire enterprise infrastructure" and put buttons in GUIs, people will take that at face value and expect them to work. I don't know how many oVirt-HCI deployments I did over the years, but for every release I've tried since 4.3.5 or so with the Cockpit HCI wizard, none has ever just worked. I've had to dig through logfiles all over the place to fix things like blacklisted storage, when I was using Gluster. Then there were VDO options that weren't supported yet on EL7 in 4.3 ansible scripts, VDO disappearing altogether after a kernel upgrade on EL8, Python 2/3 issues and I don't know how many other problems, just to get things set up. I just did a full fresh set of setups with oVirt 4.4.10 when I was testing the compatibility of the various downstream EL8 derivatives and it's still the same: the Cockpit setup HCI wizard never just works. By now I know where to fiddle, but it saps confidence in the product when every release fails the basic setup. I can hear you saying "our CI only tests at script level", can you guess why that has an impact on quality? And it was the same for nearly half of the operations in the oVirt GUI. I went through each and every one of them and for startes I often couldn't find out what they were supposed to do, while some even looked downright dangerous to click (e.g. "reset brick"). Export and import OVA were pure nightmares, because it turned out that the exported machines might in fact contain 100GB of zeros instead of the disk image. Even once that was fixed, interoperability with other hypervisors (that's the purpose of OVA), was zero. As explanation I was told here that OVA in-/export wasn't really "meant to be used", much like HCI I guess. There is a cluster upgrade button, but I think I only ever hit it once, only to notice that it just created more damage and didn't add convenience. In fact upgrading any node became an entirely manual job in the end, because it never worked. The gluster daemon never started properly after a reboot and resulted in ovirt-ha-broker, ovirt-ha-agent and vdsmd sulking, requiring carefully timed restarts to get going again. And then an upgrade procedure for a high-availability HCI from EL7/oVirt 4.3 to EL8/oVirt 4.4 that had 40 steps or so, none of which were allowed to fail and with no obvious failback just isn't "enterprise". To my eyes the value proposition of oVirt was "instant on-premise fault tolerant cloud", something I could then use to run VMs or indeed OpenShift on. oVirt never delivered in an enterprise quality and I can't see it getting any closer without a downstream product. Even a community needs a concrete vision, or perhaps at least a few real use cases.