Can't add a pre-existing (storage-only) gluster member as a compute-also node using self-hosted hyper-convergent engine

Intro: Trying to add oVirt self-hosted engine to a pre-existing set of system already using Gluster storage seems to fail, because ‘hosted engine wizard’ seems to create its own peer pool for managed hosts, instead of joining the one given for storage. Context: I am testing on a set of four Atom Goldmont+ boxes, which are silent/low-power, cheap, fast-enough for edge workloads and even take 32GB of RAM these days. But for various, sometimes even good reasons, they are not a mainline platform and I attributed many problems I faced to the niche hardware--sometimes correctly. Because the three-node hyperconverged setup has very exacting requirements, hard to meet in pre-existing machines (and in my case many initial failures with too little insight), I created the storage gluster separately first and then used the “hosted engine” wizard to set up the hosted engine on one of the Atom nodes. I used CentOS7 (fresh install and latest updates) on the primary nodes, not the oVirt node image, because some of my targets are big HPC machines, that are supposed to run oVirt for support services in a small niche, while Nvidia-Docker/SLURM workloads dominate. I assumed that if I were to split the storage and the orchestration setup into distinct steps, it would give me both better insight and flexibility to expand/transform the storage and the compute without losing any of the self-hosted hyperconvergeance comfort. Problem: I had all kinds of problem to get the hosted engine to run all the way through on the Atoms, it typically stopped just shy of the final launch as a VM on the Gluster storage. I eventually stumbled across this message from Yedidyah Bar David: https://lists.ovirt.org/pipermail/users/2018-March/087923.html I then had a look at the engine database and found that indeed the compute nodes were all in a separate gluster newly created by the hosted-engine setup and evidently used for cluster synchronization, instead of joining the pool already used for the bulk of the storage. I don’t know if this would be considered an error that needs fixing, an issue that can be avoided using a manual configuration, or something else. I believe it could use some highlighting in the documentation.

Oops, sorry for the noise: The problem could have been me... The host that I used to create the hosted appliance might have been running a local gluster at the time the wizard ran (leftover from a single node install). And that might explain why the wizard would then expand that pool with new nodes and not notice that storage and compute node point to different glusters... I'll break down, rebuild and report.

I did rebuild, but that didn't resolve the problem or answer the real question, moved to a new topic: https://lists.ovirt.org/archives/list/users@ovirt.org/thread/X3UBQT3ZKAX64TW...
participants (1)
-
thomas@hoberg.net