Hi David,

I hope you manage to recover the VM or most of the data. If you got multiple disks in that VM (easily observeable in oVirt UI), you might need to repeat that again for the rest of the disks.

Check with xfs_info the inode size (isize), as the default used to be 256, but I have noticed that in some cases mkfs.xfs picked a higher value (EL7). Also, check the gluster's logs or at least keep them for a later check. Usually, smaller inode size can cause a lot and really awkward issues in Gluster, but this needs to be verified.

Once the raid is fully rebuilt, you will have to add both the HW raid and the arbiter brick (add-brick replica 3 arbiter 1) . As you will be reusing the arbiter brick, the safest is to mkfs.xfs and also increase the inode ratio to 90%.

Can you provide your volume info ? The default shard size is just 64MB and transfer is quite fast, so there should be no locking or the symptoms reported .

Once the healing is over, you should be ready for the rebuilt of the other node.

Best Regards,
Strahil Nikolov


Ok, so right now, my production cluster is operating off of a single brick. I was planning on expanding the storage on the 2nd host next week, and adding that back into the cluster, and getting the Replica 2, Arbiter 1 redundancy working again.

How would you recommend I proceed with that plan, knowing that I'm currently operating off of a single brick in which I did NOT specify the size with `mkfs.xfs -i size=512?
Should I specify the size on the new brick I build next week, and then once everything is healed, reformat the current brick?

> And then there is a lot of information missing between the lines: I guess you are using a 3 node HCI setup and were adding new disks (/dev/sdb) on all three nodes and trying to move the glusterfs to those new bigger disks?

You are correct in that I'm using 3-node HCI. I originally built HCI with Gluster replication on all 3 nodes (Replica 3). As I'm increasing the storage, I'm also moving to an architecture of Replica 2/Arbiter 1. So yes, the plan was:

1) Convert FROM Replica 3 TO replica 2/arbiter 1
2) Convert again down to a Replica 1 (so no replication... just operating storage on a single host)
3) Rebuild the RAID array (with larger storage) on one of the unused hosts, and rebuild the gluster bricks
4) Add the larger RAID back into gluster, let it heal
5) Now, remove the bricks from the host with the smaller storage -- THIS is where things went awry, and what caused the data loss on this 1 particular VM
--- This is where I am currently ---
6) Rebuild the RAID array on the remaining host that is now unused (This is what I am / was planning to do next week)




Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, August 5th, 2021 at 3:12 PM, Thomas Hoberg <thomas@hoberg.net> wrote:

> If you manage to export the disk image via the GUI, the result should be a qcow2 format file, which you can mount/attach to anything Linux (well, if the VM was Linux... it didn't say)
>

> But it's perhaps easier to simply try to attach the disk of the failed VM as a secondary to a live VM to recover the data.
>

> Users mailing list -- users@ovirt.org
>

> To unsubscribe send an email to users-leave@ovirt.org
>

> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>

> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>

> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SLXLQ4BLQUPBV5355DFFACF6LFJX4MWY/