On February 6, 2020 6:06:18 PM GMT+02:00, Christian Reiss <email(a)christian-reiss.de>
wrote:
Hey folks,
Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of the
vms is backup'ed seperatly using bareos on an hourly basis, so files
are
present with worst case 59 minutes data loss.
Now, on the outside I thought of doing gluster snapshots and then
syncing those .snap dirs away to a remote 10gig connected machine on a
weekly-or-so basis. As those contents of the snaps are the oVirt images
(entire DC) I could re-setup gluster and copy those files back into
gluster and be done with it.
Now some questions, if I may:
- If the hosts remain intact but gluster dies, I simply setup Gluster,
stop the ovirt engine (seperate standalone hardware) copy everything
back and start ovirt engine again. All disks are accessible again
(tested). The bricks are marked as down (new bricks, same name). There
is a "reset brick" button that made the bricks come back online again.
What _exactly_ does it do? Does it reset the brick info in oVirt or
copy
all the data over from another node and really, really reset the brick?
- If the hosts remain intact, but the engine dies: Can I re-attach the
engine the the running cluster?
- If hosts and engine dies and everything needs to be re-setup would it
be possible to do the setup wizard(s) again up to a running point then
copy the disk images to the new gluster-dc-data-dir? Would oVirt rescan
the dir for newly found vms?
- If _one_ host dies, but 2 and the engine remain online: Whats the
oVirt way of resetting up the failed one? Reinstalling the node and
then
what? From all the cases above this is the most likely one.
Having had to reinstall the entire Cluster three times already scares
me. Always gluster related.
Again thank you community for your great efforts!
Gluster reset brick actually wipes the brick and starts a heal process from another
brick.
If your node dies, ovirt won't allow you to remove it from untill you restore the
'replica 3' status of gluster.
I think that the fastest way to restore a node is:
1. Reinstall the node with same hostname and network settings
2. Restore from backup gluster directory /var/lib/glusterd/
3. Restart the node and initiate a reaet brick.
4. Go to UI and remove the node that was defective
5. Add again the node
Voila.
About the gluster issues - you are not testing enough your upgrades and if you use the
cluster in production, it will be quite disruptive. For example, the ACL issue I had met
(and actually you too) was discussed in the mailing list for 2 weeks before I have
managed to resolve it.
I'm using latest oVirt with Gluster v7 - but this is my lab and I can afford downtime
of a week (or even more). The more tested is an oVirt/Gluster release - the more reliable
it will be.
Best Regards,
Strahil Nikolov