Intro:
Trying to add oVirt self-hosted engine to a pre-existing set of system already using
Gluster storage seems to fail, because ‘hosted engine wizard’ seems to create its own peer
pool for managed hosts, instead of joining the one given for storage.
Context:
I am testing on a set of four Atom Goldmont+ boxes, which are silent/low-power, cheap,
fast-enough for edge workloads and even take 32GB of RAM these days.
But for various, sometimes even good reasons, they are not a mainline platform and I
attributed many problems I faced to the niche hardware--sometimes correctly.
Because the three-node hyperconverged setup has very exacting requirements, hard to meet
in pre-existing machines (and in my case many initial failures with too little insight), I
created the storage gluster separately first and then used the “hosted engine” wizard to
set up the hosted engine on one of the Atom nodes.
I used CentOS7 (fresh install and latest updates) on the primary nodes, not the oVirt node
image, because some of my targets are big HPC machines, that are supposed to run oVirt for
support services in a small niche, while Nvidia-Docker/SLURM workloads dominate.
I assumed that if I were to split the storage and the orchestration setup into distinct
steps, it would give me both better insight and flexibility to expand/transform the
storage and the compute without losing any of the self-hosted hyperconvergeance comfort.
Problem:
I had all kinds of problem to get the hosted engine to run all the way through on the
Atoms, it typically stopped just shy of the final launch as a VM on the Gluster storage.
I eventually stumbled across this message from Yedidyah Bar David:
https://lists.ovirt.org/pipermail/users/2018-March/087923.html
I then had a look at the engine database and found that indeed the compute nodes were all
in a separate gluster newly created by the hosted-engine setup and evidently used for
cluster synchronization, instead of joining the pool already used for the bulk of the
storage.
I don’t know if this would be considered an error that needs fixing, an issue that can be
avoided using a manual configuration, or something else. I believe it could use some
highlighting in the documentation.