how to forcibly remove dead node from Gluster 3 replica 1 arbiter setup?

Hi All, my setup is running in three nodes wherein 1 is arbiter node and another two is replica, now one of my replica node entirely dead and wanted to replace it with new node, how do I proceed the node replacement and safely remove the replica node?

Hello There is an ansible playbook[2] for replacing a failed host in a gluster-enabled cluster, do check it out [1], and see if that would work out for you. [1] https://github.com/gluster/gluster-ansible/blob/master/playbooks/hc-ansible-... [2]https://github.com/gluster/gluster-ansible On Tue, Nov 2, 2021 at 11:14 AM dhanaraj.ramesh--- via Users < users@ovirt.org> wrote:
Hi All,
my setup is running in three nodes wherein 1 is arbiter node and another two is replica, now one of my replica node entirely dead and wanted to replace it with new node, how do I proceed the node replacement and safely remove the replica node? _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DV4NXTJU7IXNNK...

In order to remove a dead host you will need to:- Remove all bricks (originating from that host ) in all volumes of the TSPgluster volume remove-brick engine host3:/gluster_bricks/brick1/ force- Remove the host from the TSPgluster peer-detach host3 - Next remove the host from oVirt. In some cases you have to mark the host as rebooted, so the engine can cancel any tasks that were supposed to run there. IT IS MANDATORY YO VERIFY THAT THE HOST IS OFFLINE AND HAS NO ACCESS TO THE STORAGE. IF THE NODE IS STILL ALIVE - FIX VDSM! Best Regards,Strahil Nikolov On Tue, Nov 2, 2021 at 7:51, Ritesh Chikatwar<rchikatw@redhat.com> wrote: _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/35TGVIMOQ4LLJK...

when I try to remove all the node 2 bricks getting below error volume remove-brick commit force: failed: Bricks not from same subvol for replica when try to remove node 2 just one brick getting below error volume remove-brick commit force: failed: Remove brick incorrect brick count of 1 for replica 3 my setup running with 3 replica with 1 arbiter, and unable to remove the 2nd dead node...

the volume configured with Distributed Replicate volume with 7 bricks, when I try from GUI getting below error. Error while executing action Remove Gluster Volume Bricks: Volume remove brick force failed: rc=-1 out=() err=['Remove arbiter brick(s) only when converting from arbiter to replica 2 subvolume_']

You have to specify the volume type.When you remove 1 brick from a replica 3 volume - you are actually converting it to replica 2. As you got 2 data bricks + 1 arbiter, then Just remove the arbiter brick and the missing node's brick: gluster volume remove-brick VOL replica 1 node2:/brick node3:/brick force What is the output of 'gluster volume info VOL' & 'gluster volume heal VOL info summary' ? Best Regards,Strahil Nikolov On Wed, Nov 10, 2021 at 11:42, dhanaraj.ramesh--- via Users<users@ovirt.org> wrote: the volume configured with Distributed Replicate volume with 7 bricks, when I try from GUI getting below error. Error while executing action Remove Gluster Volume Bricks: Volume remove brick force failed: rc=-1 out=() err=['Remove arbiter brick(s) only when converting from arbiter to replica 2 subvolume_'] _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QXPOVQASYSANKX...

Hi Strahil Nikolov Thank you for the suggestion but it does not help... [root@beclovkvma01 ~]# sudo gluster volume remove-brick datastore1 replica 1 beclovkvma02.bec..net:/data/brick2/brick2 beclovkvma02.bec..net:/data/brick3/brick3 beclovkvma02.bec..net:/data/brick4/brick4 beclovkvma02.bec..net:/data/brick5/brick5 beclovkvma02.bec..net:/data/brick6/brick6 beclovkvma02.bec..net:/data/brick7/brick7 beclovkvma02.bec..net:/data/brick8/brick8 force Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume. Do you want to continue? (y/n) y volume remove-brick commit force: failed: need 14(xN) bricks for reducing replica count of the volume from 3 to 1 [root@beclovkvma01 ~]# sudo gluster volume remove-brick datastore1 replica 1 beclovkvma02.bec..net:/data/brick2/brick2 forceRemove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume. Do you want to continue? (y/n) y volume remove-brick commit force: failed: need 14(xN) bricks for reducing replica count of the volume from 3 to 1 [root@beclovkvma01 ~]# sudo gluster volume remove-brick datastore1 replica 2 beclovkvma02.bec..net:/data/brick2/brick2 force Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume. Do you want to continue? (y/n) y volume remove-brick commit force: failed: need 7(xN) bricks for reducing replica count of the volume from 3 to 2

Please provide 'gluster volume info datastore1' and specify which bricks you want to remove. Best Regards,Strahil Nikolov On Thu, Nov 11, 2021 at 6:13, dhanaraj.ramesh--- via Users<users@ovirt.org> wrote: Hi Strahil Nikolov Thank you for the suggestion but it does not help... [root@beclovkvma01 ~]# sudo gluster volume remove-brick datastore1 replica 1 beclovkvma02.bec..net:/data/brick2/brick2 beclovkvma02.bec..net:/data/brick3/brick3 beclovkvma02.bec..net:/data/brick4/brick4 beclovkvma02.bec..net:/data/brick5/brick5 beclovkvma02.bec..net:/data/brick6/brick6 beclovkvma02.bec..net:/data/brick7/brick7 beclovkvma02.bec..net:/data/brick8/brick8 force Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume. Do you want to continue? (y/n) y volume remove-brick commit force: failed: need 14(xN) bricks for reducing replica count of the volume from 3 to 1 [root@beclovkvma01 ~]# sudo gluster volume remove-brick datastore1 replica 1 beclovkvma02.bec..net:/data/brick2/brick2 forceRemove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume. Do you want to continue? (y/n) y volume remove-brick commit force: failed: need 14(xN) bricks for reducing replica count of the volume from 3 to 1 [root@beclovkvma01 ~]# sudo gluster volume remove-brick datastore1 replica 2 beclovkvma02.bec..net:/data/brick2/brick2 force Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume. Do you want to continue? (y/n) y volume remove-brick commit force: failed: need 7(xN) bricks for reducing replica count of the volume from 3 to 2 _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7M7IZI7AP7NWDN...

Hi Strahil Nikolov Volume Name: datastore1 Type: Distributed-Replicate Volume ID: bc362259-14d4-4357-96bd-8db6492dc788 Status: Started Snapshot Count: 0 Number of Bricks: 7 x (2 + 1) = 21 Transport-type: tcp Bricks: Brick1: beclovkvma01.bec.net:/data/brick2/brick2 Brick2: beclovkvma02.bec.net:/data/brick2/brick2 Brick3: beclovkvma03.bec.net:/data/brick1/brick2 (arbiter) Brick4: beclovkvma01.bec.net:/data/brick3/brick3 Brick5: beclovkvma02.bec.net:/data/brick3/brick3 Brick6: beclovkvma03.bec.net:/data/brick1/brick3 (arbiter) Brick7: beclovkvma01.bec.net:/data/brick4/brick4 Brick8: beclovkvma02.bec.net:/data/brick4/brick4 Brick9: beclovkvma03.bec.net:/data/brick1/brick4 (arbiter) Brick10: beclovkvma01.bec.net:/data/brick5/brick5 Brick11: beclovkvma02.bec.net:/data/brick5/brick5 Brick12: beclovkvma03.bec.net:/data/brick1/brick5 (arbiter) Brick13: beclovkvma01.bec.net:/data/brick6/brick6 Brick14: beclovkvma02.bec.net:/data/brick6/brick6 Brick15: beclovkvma03.bec.net:/data/brick1/brick6 (arbiter) Brick16: beclovkvma01.bec.net:/data/brick7/brick7 Brick17: beclovkvma02.bec.net:/data/brick7/brick7 Brick18: beclovkvma03.bec.net:/data/brick1/brick7 (arbiter) Brick19: beclovkvma01.bec.net:/data/brick8/brick8 Brick20: beclovkvma02.bec.net:/data/brick8/brick8 Brick21: beclovkvma03.bec.net:/data/brick1/brick8 (arbiter) Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet storage.fips-mode-rchecksum: on cluster.lookup-optimize: off server.keepalive-count: 5 server.keepalive-interval: 2 server.keepalive-time: 10 server.tcp-user-timeout: 20 network.ping-timeout: 30 server.event-threads: 4 client.event-threads: 4 features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable performance.strict-o-direct: on performance.low-prio-threads: 32 storage.owner-gid: 36 storage.owner-uid: 36 network.remote-dio: off All the bricks from beclovkvma02.bec.net as the node 2 dead, now the node has been reinstalled,

wanted to remove beclovkvma02.bec.net as the node was dead, now I reinstalled this node and trying to add as 4th node - beclovkvma04.bec.net however since the system UUID is same Im not able to add the node in ovirt gluster..

As I mentioned in the slack, the safest approach is to: 1. Reduce the volume to replica 1 (there is no need to keep the arbiter until resynchronization gluster volume remove-brick VOLUME replica 1 beclovkvma02.bec.net:/data/brick2/brick2 beclovkvma03.bec.net:/data/brick1/brick2 beclovkvma02.bec.net:/data/brick3/brick3 beclovkvma03.bec.net:/data/brick1/brick3 beclovkvma02.bec.net:/data/brick4/brick4 beclovkvma03.bec.net:/data/brick1/brick4 beclovkvma02.bec.net:/data/brick5/brick5 beclovkvma03.bec.net:/data/brick1/brick5 beclovkvma02.bec.net:/data/brick6/brick6 beclovkvma03.bec.net:/data/brick1/brick6 beclovkvma02.bec.net:/data/brick7/brick7 beclovkvma03.bec.net:/data/brick1/brick7 beclovkvma02.bec.net:/data/brick8/brick8 beclovkvma03.bec.net:/data/brick1/brick8 force Note: I might have missed a brick, so verify that you are selecting all bricks for the arbiter and beclovkvma02 2. Remove the broken nodegluster peer detach beclovkvma02.bec.net force 3. Add the freshly installed host:gluster peer probe beclovkvma04.bec.net 4. Umount all bricks on the arbiter.Then reformat them:mkfs.xfs -f -i size=512 /path/to/each/arbiter/brick/LV 5. Check if fstab is using UUID and if yes -> update with the /dev/VG/LV or with the new UUIDs (blkid should help) 6. Mount all bricks on the arbiter - no errors should be reported:mount -a 7. Umount , reformat and remount all bricks on beclovkvma04.bec.net . Don't forget to check the fstab. 'mount -a' is your first friend 8. Readd the bricks to the volume. Order is important (first 04, then arbiter, 04, arbiter...) gluster volume brick-add VOLUME replica 3 arbiter1 beclovkvma04.bec.net:/data/brick2/brick2 beclovkvma03.bec.net:/data/brick1/brick2 beclovkvma04.bec.net:/data/brick3/brick3 beclovkvma03.bec.net:/data/brick1/brick3 beclovkvma04.bec.net:/data/brick4/brick4 beclovkvma03.bec.net:/data/brick1/brick4 beclovkvma04.bec.net:/data/brick5/brick5 beclovkvma03.bec.net:/data/brick1/brick5 beclovkvma04.bec.net:/data/brick6/brick6 beclovkvma03.bec.net:/data/brick1/brick6 beclovkvma04.bec.net:/data/brick7/brick7 beclovkvma03.bec.net:/data/brick1/brick7 beclovkvma04.bec.net:/data/brick8/brick8 beclovkvma03.bec.net:/data/brick1/brick8 9. Trigger the full heal:gluster volume heal VOLUME full 10. If your bricks are high performant and you need to speed up the healing you can increase these volume settings: - cluster.shd-max-threads- cluster.shd-wait-qlenght Best Regards,Strahil Nikolov On Fri, Nov 12, 2021 at 8:21, dhanaraj.ramesh--- via Users<users@ovirt.org> wrote: wanted to remove beclovkvma02.bec.net as the node was dead, now I reinstalled this node and trying to add as 4th node - beclovkvma04.bec.net however since the system UUID is same Im not able to add the node in ovirt gluster.. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XXY6FD7G6PUYKE...
participants (3)
-
dhanaraj.ramesh@yahoo.com
-
Ritesh Chikatwar
-
Strahil Nikolov