What you are missing is the fact that gluster requires more than one
set of bricks to recover from a dead host. I.e. In your set up, you'd
need 6 hosts. 4x replicas and 2x arbiters with at least one set (2x
replicas and 1x arbiter) operational bare minimum.
Automated commands to fix the volume do not exist otherwise. (It's a
Gluster limitation.) This can be fixed manually however.
Standard Disclaimer: Back up your data first! Fixing this issue
requires manual intervention. Reader assumes all responsiblity for any
action resulting from the instructions below. Etc.
If it's just a dead brick, (i.e. the host is still functional), all you
really need to do is replace the underlying storage:
1. Take the gluster volume offline.
2. Remove the bad storage device, and attach the replacement.
3. rsync / scp / etc. the data from a known good brick (be sure to
include hidden files / preserve file times and ownership / SELinux
labels / etc. ).
4. Restart the gluster volume.
Gluster *might* still need to heal everything after all of that, but it
should start the volume and get it running again.
If the host itself is dead, (and the underlying storage is still
functional), you can just move the underlying storage over to the new
host:
1. Take the gluster volume offline.
2. Attach the old storage.
3. Fix up the ids on the volume file.
(
https://serverfault.com/questions/631365/rename-a-glusterfs-peer)
4. Restart the gluster volume.
If both the host and underlying storage are dead, you'll need to do
both tasks:
1. Take the gluster volume offline.
2. Attach the new storage.
3. rsync / scp / etc. the data from a known good brick (be sure to
include hidden files / preserve file times and ownership / SELinux
labels / etc. ).
4. Fix up the ids on the volume file.
5. Restart the gluster volume.
Keep in mind one thing however: If the gluster host you are replacing
is used by oVirt to connect to the volume (I.e. It's the host named in
the volume config in the Admin portal). The new host will need to
retain the old hostname / IP, or you'll need to update oVirt's config.
Otherwise the VM hosts will wind up in Unassigned / Non-functional
status.
- Patrick Hibbs
On Sun, 2022-07-17 at 22:15 +0300, Gilboa Davara wrote:
Hello all,
I'm attempting to replace a dead host in a replica 2 + arbiter
gluster setup and replace it with a new host.
I've already set up a new host (same hostname..localdomain) and got
into the cluster.
$ gluster peer status
Number of Peers: 2
Hostname: office-wx-hv3-lab-gfs
Uuid: 4e13f796-b818-4e07-8523-d84eb0faa4f9
State: Peer in Cluster (Connected)
Hostname: office-wx-hv1-lab-gfs.localdomain <------ This is a new
host.
Uuid: eee17c74-0d93-4f92-b81d-87f6b9c2204d
State: Peer in Cluster (Connected)
$ gluster volume info GV2Data
Volume Name: GV2Data
Type: Replicate
Volume ID: c1946fc2-ed94-4b9f-9da3-f0f1ee90f303
Status: Stopped
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: office-wx-hv1-lab-gfs:/mnt/LogGFSData/brick <------ This is
the dead host.
Brick2: office-wx-hv2-lab-gfs:/mnt/LogGFSData/brick
Brick3: office-wx-hv3-lab-gfs:/mnt/LogGFSData/brick (arbiter)
...
Looking at the docs, it seems that I need to remove the dead brick.
$ gluster volume remove-brick GV2Data office-wx-hv1-lab-
gfs:/mnt/LogGFSData/brick start
Running remove-brick with cluster.force-migration enabled can result
in data corruption. It is safer to disable this option so that files
that receive writes during migration are not migrated.
Files that are not migrated can then be manually copied after the
remove-brick commit operation.
Do you want to continue with your current cluster.force-migration
settings? (y/n) y
volume remove-brick start: failed: Removing bricks from replicate
configuration is not allowed without reducing replica count
explicitly
So I guess I need to drop from replica 2 + arbiter to replica 1 +
arbiter (?).
$ gluster volume remove-brick GV2Data replica 1 office-wx-hv1-lab-
gfs:/mnt/LogGFSData/brick start
Running remove-brick with cluster.force-migration enabled can result
in data corruption. It is safer to disable this option so that files
that receive writes during migration are not migrated.
Files that are not migrated can then be manually copied after the
remove-brick commit operation.
Do you want to continue with your current cluster.force-migration
settings? (y/n) y
volume remove-brick start: failed: need 2(xN) bricks for reducing
replica count of the volume from 3 to 1
... What am I missing?
- Gilboa
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIXTFTJREUA...