It was this near endless range of possibilities via permutation of the parts that
originally attracted me to oVirt.
Being clearly a member of the original Lego generation I imagined how you could simply add
blocks of this and that to rebuild to something new fantastic..., limitless gluster
scaling and HCI, VDO dedup/compression, SSD caching, nested virtualization,
geo-replication, OVA support, it just had everything I might want!
The truth is rather ugly from what I have gathered around here during almost three years.
You don't mention it explicitly, but you evidently are talking about a HCI setup,
preferably installed, expanded and managed by the nice Cockpit/Engine GUIs.
What I learned is that Gluster based HCI receives very little love from the oVirt
developers, even if it seems to you and me (Lego people?) the most attractive option.
My impression is that they tend to work more with the original non-HCI approach based on
SAN or NFS storage, even if the engine may now by default be a VM, when it was separate
servers originally.
Gluster, VDO, Ansible, Java engine are all aquisitions, HCI more of a management decision
to counter Nutanix and I find that following the evolution from Moshe Bar's Mosix to
Qumranet and the Permabit, Ansible and Gluster acquisitions as well as the competitor
products helps understand why things are as they are.
The current oVirt code base supports single node HCI, which delivers no benefits beyond
testing. It also supports 3 node HCI, which kind of works with the Cockpit wizard (I have
never succeeded with an installation where I didn't have to do some fiddling).
Beyond that you already branch out into the jungle of not-tested, not-supported, even if I
remember reading that 6-nodes and 9-node HCI seem possible. But quorums with 6 nodes
don't seem natural and certainly nobody would want to use replicas in a 9-node HCI
setup, right?
I've seem actual "not supported" comments in the oVirt Ansible code that
stops any installation using dispersed volumes, so there is your immediate answer.
I've tricked it past that point editing Ansible scripts and actually got oVirt running
with dispersed (erasure coded) volumes on 5 nodes, but it felt too wobbly for real use.
Instead I've added the CPU/RAM parts of these 5 nodes to a 3 node HCI setup and then
used the disks to create an erasure coded Gluster, which then can be used by VMs via
Gluster or NFS pretty much like a filer.
I can't recommend that unless you're willing to pay the price of being on your
own.
One major issue here is that you can't just create glusters and then merge them as any
system can only ever be a member of one gluster. And you have to destroy volumes before
moving to an other gluster: not sure how that rhymes with bottleneck-free scalability.
And then there are various parts in oVirt, which look for quorums and find them missing
even when nodes aren't actually contributing bricks to a volume (compute-only hosts or
non-HCI volumes).
I guess it's safe to say that everybody here would like to see you trying and feeding
the changes required to make it work back into the project...
...but it's not "supported out of the box".
P.S. When I'm offered erasure coding, VDO dedup/compression and thin allocation, I
naturally tend to tick all boxes (originally I also chose SSD cache, but quickly changed
to SSD-only storage). It's only later when they mention somewhere, that you aren't
supposed to use them in combination, without a full explanation or analysis to follow.
Because even at today's SSD storage pricing I need all the tricks in the book I stayed
with them all, but Gluster doesn't seem to be a speed devil, no matter what you put on
top.
And to think that there used to be a UDMA/Infiniband option, that later disappeared with
little comment...
VMs in a 9-node HCI replica gluster could only ever get ~1Gbit/s throughput on a 10Gbit/s
network, so erasure coding should be the smart choice...