[ovirt-users] potential split-brain after upgrading Gluster version and rebooting one of three storage nodes

11 Jan 2021

      Hello everyone,

I'm running an oVirt cluster (4.3.10.4-1.el7) on a bunch of physical nodes with Centos 7.9.2009 and the Hosted Engine is running as a virtual machine on one of these nodes. As for the storage, I'm running GlusterFS 6.7 on three separate physical storage nodes (also Centos 7). Gluster itself has three different volumes of the type "Replicate" or "Distributed-Replicate".

I recently updated both the system packages and the GlusterFS version to 6.10 on the first storage node (storage1) and now I'm seeing a potential split-brain situation for one of the three volumes when running "gluster volume heal info":

Brick storage1:/data/glusterfs/nvme/brick1/brick
Status: Connected
Number of entries: 0

Brick storage2:/data/glusterfs/nvme/brick1/brick
/c32d664d-69ba-4c3f-8ea1-240133963815/dom_md/ids
/
/.shard/.remove_me
Status: Connected
Number of entries: 3

Brick storage3:/data/glusterfs/nvme/brick1/brick
/c32d664d-69ba-4c3f-8ea1-240133963815/dom_md/ids
/
/.shard/.remove_me
Status: Connected
Number of entries: 3

I checked the hashes and the "dom_md/ids" file has a different md5 on every node. Using the heal on the volume command doesn't do anything and the entries remain. The heal info for the other two volumes shows no entries.

The affected gluster volume (type: replicate) is mounted as a Storage Domain using the path "storage1:/nvme" inside of oVirt and is used to store the root partitions of all virtual machines, which were running at the time of the upgrade and reboot of storage1. The volume has three bricks, with one brick being stored on each storage node. For the upgrade process I followed the steps shown at https://docs.gluster.org/en/latest/Upgrade-Guide/generic-upgrade-procedure/. I stopped and killed all gluster related services, upgraded both system and gluster packages and rebooted storage1.

Is this a split brain situation and how can I solve this? I would be very grateful for any help.

Please let me know if you require any additional information.

Best regards

[ovirt-users] potential split-brain after upgrading Gluster version and rebooting one of three storage nodes

user-5138＠yandex.com