Yes, I've also posted this on the Gluster Slack. But I am using Gluster mostly because
it's part of oVirt HCI, so don't just send me away, please!
Problem: GlusterD refusing to start due to quorum issues for volumes where it isn’t
contributing any brick
(I've had this before on a different farm, but there it was transitory. Now I have it
in a more observable manner, that's why I open a new topic)
In a test farm with recycled servers, I started running Gluster via oVirt 3node-HCI,
because I got 3 machines originally.
They were set up as group A in a 2:1 (replica:arbiter) oVirt HCI setup with
'engine', 'vmstore' and 'data' volumes, one brick on each node.
I then got another five machines with hardware specs that were rather different to group
A, so I set those up as group B to mostly act as compute nodes, but also to provide extra
storage, mostly to be used externally as GlusterFS shares. It took a bit of fiddling with
Ansible but I got these 5 nodes to serve two more Gluster volumes 'tape' and
'scratch' using dispersed bricks (4 disperse:1 redundancy), RAID5 in my mind.
The two groups are in one Gluster, not because they serve bricks to the same volumes, but
because oVirt doesn't like nodes to be in different Glusters (or actually, to already
be in a Gluster when you add them as host node). But the two groups provide bricks to
distinct volumes, there is no overlap.
After setup things have been running fine for weeks, but now I needed to restart a machine
from group B, which has ‘tape’ and ‘scratch’ bricks, but none from original oVirt
‘engine’, ‘vmstore’ and ‘data’ in group A. Yet the gluster daemon refuses to start, citing
a loss of quorum for these three volumes, even if it has no bricks in them… which makes no
sense to me.
I am afraid the source of the issue is concept issues: I clearly don't really
understand some design assumptions of Gluster.
And I'm afraid the design assumptions of Gluster and of oVirt (even with HCI), are not
as related as one might assume from the marketing materials on the oVirt home-page.
But most of all I'd like to know: How do I fix this now?
I can't heal 'tape' and 'scratch', which are growing ever more apart
while the glusterd on this machine in group B refuses to come online for lack of a quorum
on volumes where it is not contributing bricks.