It would be interesting to know, how the previous team got to six nodes: I don't
remember seeing any documentation how to do that easily...
However, this state of affairs also seems to be quite normal, whenever I reboot a single
node HCI setup: I've seen that with two systems now, one running 4.3.11 on CentOS 7.8,
the other 4.4.1 on CentOS 8.2.
What seems to happen in my case is some sort of a race condition or time-out,
ovirt-ha-broker, ovirt-ha-agent and vdsmd all seem to fail in various ways, because
glusterd isn't showing perfect connectivity between all storage nodes (actually in
this case, it still fails to be perfect, even if there is only one node...)
I tend to restart glusterd carefully on any node that is seen as disconnected or not up
(gluster volume status all), and once that is perfect and any gluster heals are through, I
restart ovirt-ha-broker, ovirt-ha-agent and vdsmd nice and slow and not really in any
particilar order, I just have a look to see if they stop complaining or stopping via
systemctl status <name>.
In the mean-time I check with hosted-engine --vm-status on all nodes to see if this
"is the hosted engine setup finished" message disappears and with a bit of
patience, it tends to come back. You might also went to make sure, that none of the nodes
are on local maintenance or the whole data center is on global maintenance.
Let me tell you that I have pulled a lot of hair when I started with oVirt, because I tend
to expect immediate reactions to any command I give. But here there is such a lot of
automation going on in the background, that commands are really more like a bit of grease
on the cogs of a giant gearbox and most of the time it just works automagically.