[ovirt-users] Re: Gluster quorum issue on 3-node HCI with extra 5-nodes as compute and storage nodes

Monday, 14 September 2020

Am 14.09.2020 um 15:23 schrieb thomas(a)hoberg.net:

Sorry two times now:
1. It is a duplicate post, because the delay for posts to show up on the 
web site is ever longer (as I am responding via mail, the first post is 
still not shown...)

2. It seems to have been a wild goose chase: The gluster daemon from 
group B did eventually regain quorum (or returned to its senses) some 
time later... the error message is pretty scary and IMHO somewhat 
misleading, but...

With oVirt one must learn to be patient, evidently all that self-healing 
built-in depends on state machines turning their cogs and gears, not on 
admins pushing for things to happen... sorry!
...
 Yes, I've also posted this on the Gluster Slack. But I am using
Gluster mostly because it's part of oVirt HCI, so don't just send me away,
please!

 Problem: GlusterD refusing to start due to quorum issues for volumes where it isn’t
contributing any brick

 (I've had this before on a different farm, but there it was transitory. Now I have it
in a more observable manner, that's why I open a new topic)

 In a test farm with recycled servers, I started running Gluster via oVirt 3node-HCI,
because I got 3 machines originally.
 They were set up as group A in a 2:1 (replica:arbiter) oVirt HCI setup with
'engine', 'vmstore' and 'data' volumes, one brick on each node.

 I then got another five machines with hardware specs that were rather different to group
A, so I set those up as group B to mostly act as compute nodes, but also to provide extra
storage, mostly to be used externally as GlusterFS shares. It took a bit of fiddling with
Ansible but I got these 5 nodes to serve two more Gluster volumes 'tape' and
'scratch' using dispersed bricks (4 disperse:1 redundancy), RAID5 in my mind.

 The two groups are in one Gluster, not because they serve bricks to the same volumes, but
because oVirt doesn't like nodes to be in different Glusters (or actually, to already
be in a Gluster when you add them as host node). But the two groups provide bricks to
distinct volumes, there is no overlap.

 After setup things have been running fine for weeks, but now I needed to restart a
machine from group B, which has ‘tape’ and ‘scratch’ bricks, but none from original oVirt
‘engine’, ‘vmstore’ and ‘data’ in group A. Yet the gluster daemon refuses to start, citing
a loss of quorum for these three volumes, even if it has no bricks in them… which makes no
sense to me.

 I am afraid the source of the issue is concept issues: I clearly don't really
understand some design assumptions of Gluster.
 And I'm afraid the design assumptions of Gluster and of oVirt (even with HCI), are
not as related as one might assume from the marketing materials on the oVirt home-page.

 But most of all I'd like to know: How do I fix this now?

 I can't heal 'tape' and 'scratch', which are growing ever more apart
while the glusterd on this machine in group B refuses to come online for lack of a quorum
on volumes where it is not contributing bricks.
 _______________________________________________
 Users mailing list -- users(a)ovirt.org
 To unsubscribe send an email to users-leave(a)ovirt.org
 Privacy Statement: https://www.ovirt.org/privacy-policy.html
 oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
 List Archives: 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Re: Gluster quorum issue on 3-node HCI with extra 5-nodes as compute and storage nodes