Am 14.09.2020 um 15:23 schrieb thomas(a)hoberg.net:
Sorry two times now:
1. It is a duplicate post, because the delay for posts to show up on the
web site is ever longer (as I am responding via mail, the first post is
still not shown...)
2. It seems to have been a wild goose chase: The gluster daemon from
group B did eventually regain quorum (or returned to its senses) some
time later... the error message is pretty scary and IMHO somewhat
misleading, but...
With oVirt one must learn to be patient, evidently all that self-healing
built-in depends on state machines turning their cogs and gears, not on
admins pushing for things to happen... sorry!
Yes, I've also posted this on the Gluster Slack. But I am using
Gluster mostly because it's part of oVirt HCI, so don't just send me away,
please!
Problem: GlusterD refusing to start due to quorum issues for volumes where it isn’t
contributing any brick
(I've had this before on a different farm, but there it was transitory. Now I have it
in a more observable manner, that's why I open a new topic)
In a test farm with recycled servers, I started running Gluster via oVirt 3node-HCI,
because I got 3 machines originally.
They were set up as group A in a 2:1 (replica:arbiter) oVirt HCI setup with
'engine', 'vmstore' and 'data' volumes, one brick on each node.
I then got another five machines with hardware specs that were rather different to group
A, so I set those up as group B to mostly act as compute nodes, but also to provide extra
storage, mostly to be used externally as GlusterFS shares. It took a bit of fiddling with
Ansible but I got these 5 nodes to serve two more Gluster volumes 'tape' and
'scratch' using dispersed bricks (4 disperse:1 redundancy), RAID5 in my mind.
The two groups are in one Gluster, not because they serve bricks to the same volumes, but
because oVirt doesn't like nodes to be in different Glusters (or actually, to already
be in a Gluster when you add them as host node). But the two groups provide bricks to
distinct volumes, there is no overlap.
After setup things have been running fine for weeks, but now I needed to restart a
machine from group B, which has ‘tape’ and ‘scratch’ bricks, but none from original oVirt
‘engine’, ‘vmstore’ and ‘data’ in group A. Yet the gluster daemon refuses to start, citing
a loss of quorum for these three volumes, even if it has no bricks in them… which makes no
sense to me.
I am afraid the source of the issue is concept issues: I clearly don't really
understand some design assumptions of Gluster.
And I'm afraid the design assumptions of Gluster and of oVirt (even with HCI), are
not as related as one might assume from the marketing materials on the oVirt home-page.
But most of all I'd like to know: How do I fix this now?
I can't heal 'tape' and 'scratch', which are growing ever more apart
while the glusterd on this machine in group B refuses to come online for lack of a quorum
on volumes where it is not contributing bricks.
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives: