[ovirt-users] Re: Backup Solution

6 Feb 2020

      On February 6, 2020 6:06:18 PM GMT+02:00, Christian Reiss <email@christian-reiss.de> wrote:
...
Hey folks,
Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of the 
vms is backup'ed seperatly using bareos on an hourly basis, so files
are 
present with worst case 59 minutes data loss.
Now, on the outside I thought of doing gluster snapshots and then 
syncing those .snap dirs away to a remote 10gig connected machine on a 
weekly-or-so basis. As those contents of the snaps are the oVirt images
(entire DC) I could re-setup gluster and copy those files back into 
gluster and be done with it.
Now some questions, if I may:
- If the hosts remain intact but gluster dies, I simply setup Gluster, 
stop the ovirt engine (seperate standalone hardware) copy everything 
back and start ovirt engine again. All disks are accessible again 
(tested). The bricks are marked as down (new bricks, same name). There 
is a "reset brick" button that made the bricks come back online again. 
What _exactly_ does it do? Does it reset the brick info in oVirt or
copy 
all the data over from another node and really, really reset the brick?
- If the hosts remain intact, but the engine dies: Can I re-attach the 
engine the the running cluster?
- If hosts and engine dies and everything needs to be re-setup would it
be possible to do the setup wizard(s) again up to a running point then 
copy the disk images to the new gluster-dc-data-dir? Would oVirt rescan
the dir for newly found vms?
- If _one_ host dies, but 2 and the engine remain online: Whats the 
oVirt way of resetting up the failed one? Reinstalling the node and
then 
what? From all the cases above this is the most likely one.
Having had to reinstall the entire Cluster three times already scares 
me. Always gluster related.
Again thank you community for your great efforts!
Gluster  reset brick actually wipes the brick and starts a heal process from another  brick.

If your node dies, ovirt won't allow you to remove it from untill you restore the 'replica 3' status of gluster.
I think that the fastest way to restore a node is:
1. Reinstall the node with same hostname and network settings
2. Restore from backup gluster directory /var/lib/glusterd/
3.  Restart the node  and initiate a reaet brick.
4. Go to UI and remove the node that was defective
5. Add  again the node

Voila.

About the gluster  issues - you are not testing enough your upgrades and if you use the cluster in production, it will be quite disruptive. For example, the ACL issue  I had met (and actually you too) was  discussed  in the mailing list for 2 weeks before I have managed  to resolve it.

I'm using latest oVirt with Gluster v7 - but this is my lab and I can afford downtime of a week (or even more). The more tested is an oVirt/Gluster release - the more reliable it will be.

Best Regards,
Strahil Nikolov