
On February 6, 2020 6:06:18 PM GMT+02:00, Christian Reiss <email@christian-reiss.de> wrote:
Hey folks,
Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of the vms is backup'ed seperatly using bareos on an hourly basis, so files are present with worst case 59 minutes data loss.
Now, on the outside I thought of doing gluster snapshots and then syncing those .snap dirs away to a remote 10gig connected machine on a weekly-or-so basis. As those contents of the snaps are the oVirt images
(entire DC) I could re-setup gluster and copy those files back into gluster and be done with it.
Now some questions, if I may:
- If the hosts remain intact but gluster dies, I simply setup Gluster, stop the ovirt engine (seperate standalone hardware) copy everything back and start ovirt engine again. All disks are accessible again (tested). The bricks are marked as down (new bricks, same name). There is a "reset brick" button that made the bricks come back online again. What _exactly_ does it do? Does it reset the brick info in oVirt or copy all the data over from another node and really, really reset the brick?
- If the hosts remain intact, but the engine dies: Can I re-attach the engine the the running cluster?
- If hosts and engine dies and everything needs to be re-setup would it
be possible to do the setup wizard(s) again up to a running point then copy the disk images to the new gluster-dc-data-dir? Would oVirt rescan
the dir for newly found vms?
- If _one_ host dies, but 2 and the engine remain online: Whats the oVirt way of resetting up the failed one? Reinstalling the node and then what? From all the cases above this is the most likely one.
Having had to reinstall the entire Cluster three times already scares me. Always gluster related.
Again thank you community for your great efforts!
Gluster reset brick actually wipes the brick and starts a heal process from another brick. If your node dies, ovirt won't allow you to remove it from untill you restore the 'replica 3' status of gluster. I think that the fastest way to restore a node is: 1. Reinstall the node with same hostname and network settings 2. Restore from backup gluster directory /var/lib/glusterd/ 3. Restart the node and initiate a reaet brick. 4. Go to UI and remove the node that was defective 5. Add again the node Voila. About the gluster issues - you are not testing enough your upgrades and if you use the cluster in production, it will be quite disruptive. For example, the ACL issue I had met (and actually you too) was discussed in the mailing list for 2 weeks before I have managed to resolve it. I'm using latest oVirt with Gluster v7 - but this is my lab and I can afford downtime of a week (or even more). The more tested is an oVirt/Gluster release - the more reliable it will be. Best Regards, Strahil Nikolov