
Hey folks, Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of the vms is backup'ed seperatly using bareos on an hourly basis, so files are present with worst case 59 minutes data loss. Now, on the outside I thought of doing gluster snapshots and then syncing those .snap dirs away to a remote 10gig connected machine on a weekly-or-so basis. As those contents of the snaps are the oVirt images (entire DC) I could re-setup gluster and copy those files back into gluster and be done with it. Now some questions, if I may: - If the hosts remain intact but gluster dies, I simply setup Gluster, stop the ovirt engine (seperate standalone hardware) copy everything back and start ovirt engine again. All disks are accessible again (tested). The bricks are marked as down (new bricks, same name). There is a "reset brick" button that made the bricks come back online again. What _exactly_ does it do? Does it reset the brick info in oVirt or copy all the data over from another node and really, really reset the brick? - If the hosts remain intact, but the engine dies: Can I re-attach the engine the the running cluster? - If hosts and engine dies and everything needs to be re-setup would it be possible to do the setup wizard(s) again up to a running point then copy the disk images to the new gluster-dc-data-dir? Would oVirt rescan the dir for newly found vms? - If _one_ host dies, but 2 and the engine remain online: Whats the oVirt way of resetting up the failed one? Reinstalling the node and then what? From all the cases above this is the most likely one. Having had to reinstall the entire Cluster three times already scares me. Always gluster related. Again thank you community for your great efforts! -- with kind regards, mit freundlichen Gruessen, Christian Reiss

You should look at the gluster georeplication option, I think it would be more appropriate for disaster recovery purposes. It is also possible to export VMs as OVA which can then be reimported back into oVirt. I actually just wrote an ansible playbook to do this very thing and intend to share my finding and playbooks with the ovirt community hopefully this week. On Thu, Feb 6, 2020 at 12:18 PM Christian Reiss <email@christian-reiss.de> wrote:
Hey folks,
Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of the vms is backup'ed seperatly using bareos on an hourly basis, so files are present with worst case 59 minutes data loss.
Now, on the outside I thought of doing gluster snapshots and then syncing those .snap dirs away to a remote 10gig connected machine on a weekly-or-so basis. As those contents of the snaps are the oVirt images (entire DC) I could re-setup gluster and copy those files back into gluster and be done with it.
Now some questions, if I may:
- If the hosts remain intact but gluster dies, I simply setup Gluster, stop the ovirt engine (seperate standalone hardware) copy everything back and start ovirt engine again. All disks are accessible again (tested). The bricks are marked as down (new bricks, same name). There is a "reset brick" button that made the bricks come back online again. What _exactly_ does it do? Does it reset the brick info in oVirt or copy all the data over from another node and really, really reset the brick?
- If the hosts remain intact, but the engine dies: Can I re-attach the engine the the running cluster?
- If hosts and engine dies and everything needs to be re-setup would it be possible to do the setup wizard(s) again up to a running point then copy the disk images to the new gluster-dc-data-dir? Would oVirt rescan the dir for newly found vms?
- If _one_ host dies, but 2 and the engine remain online: Whats the oVirt way of resetting up the failed one? Reinstalling the node and then what? From all the cases above this is the most likely one.
Having had to reinstall the entire Cluster three times already scares me. Always gluster related.
Again thank you community for your great efforts!
-- with kind regards, mit freundlichen Gruessen,
Christian Reiss _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A4IVGFXXYQI4GS...

Hey Jamie, thanks for replying. I was wondering about gluster g-rep, but what if something that just happened to me (gluster f*ckup) will get replicated too. At this point (lost 3 HCI clusters due to Gluster) I am not really trusting this piece of software with my live data *and* my backups. I am really protecting myself against Gluster than anything else. So for backup purposes: The less Gluster, the better. -Chris. On 06/02/2020 18:31, Jayme wrote:
You should look at the gluster georeplication option, I think it would be more appropriate for disaster recovery purposes. It is also possible to export VMs as OVA which can then be reimported back into oVirt. I actually just wrote an ansible playbook to do this very thing and intend to share my finding and playbooks with the ovirt community hopefully this week.
On Thu, Feb 6, 2020 at 12:18 PM Christian Reiss <email@christian-reiss.de <mailto:email@christian-reiss.de>> wrote:
Hey folks,
Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of the vms is backup'ed seperatly using bareos on an hourly basis, so files are present with worst case 59 minutes data loss.
Now, on the outside I thought of doing gluster snapshots and then syncing those .snap dirs away to a remote 10gig connected machine on a weekly-or-so basis. As those contents of the snaps are the oVirt images (entire DC) I could re-setup gluster and copy those files back into gluster and be done with it.
Now some questions, if I may:
- If the hosts remain intact but gluster dies, I simply setup Gluster, stop the ovirt engine (seperate standalone hardware) copy everything back and start ovirt engine again. All disks are accessible again (tested). The bricks are marked as down (new bricks, same name). There is a "reset brick" button that made the bricks come back online again. What _exactly_ does it do? Does it reset the brick info in oVirt or copy all the data over from another node and really, really reset the brick?
- If the hosts remain intact, but the engine dies: Can I re-attach the engine the the running cluster?
- If hosts and engine dies and everything needs to be re-setup would it be possible to do the setup wizard(s) again up to a running point then copy the disk images to the new gluster-dc-data-dir? Would oVirt rescan the dir for newly found vms?
- If _one_ host dies, but 2 and the engine remain online: Whats the oVirt way of resetting up the failed one? Reinstalling the node and then what? From all the cases above this is the most likely one.
Having had to reinstall the entire Cluster three times already scares me. Always gluster related.
Again thank you community for your great efforts!
-- with kind regards, mit freundlichen Gruessen,
Christian Reiss _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A4IVGFXXYQI4GS...
-- with kind regards, mit freundlichen Gruessen, Christian Reiss

I understand your concerns and don't have very much personal experience with geo-replication either, aside from knowing it's recommended in the RHEV documentation for disaster recovery. I do believe your specific concern about replicating issues to the geo replica have been considered and protected against by delayed writes and other mechanisms but I'm not experienced enough with it to say how resilient it is. Keep an eye out on the mailing list, I'm in the process of writing up a blog post and setting up a github repo to share what I've been doing with ansible to backup VMs, this simple approach may work for you. The whole reason I wanted to find a simple way to backup full VM images was because of your concerns exactly, I'm worried about a major GlusterFS issue bringing down all my VMs and I want to be sure I have a way to recover them. On Thu, Feb 6, 2020 at 2:07 PM Christian Reiss <email@christian-reiss.de> wrote:
Hey Jamie,
thanks for replying. I was wondering about gluster g-rep, but what if something that just happened to me (gluster f*ckup) will get replicated too. At this point (lost 3 HCI clusters due to Gluster) I am not really trusting this piece of software with my live data *and* my backups.
I am really protecting myself against Gluster than anything else. So for backup purposes: The less Gluster, the better.
-Chris.
You should look at the gluster georeplication option, I think it would be more appropriate for disaster recovery purposes. It is also possible to export VMs as OVA which can then be reimported back into oVirt. I actually just wrote an ansible playbook to do this very thing and intend to share my finding and playbooks with the ovirt community hopefully this week.
On Thu, Feb 6, 2020 at 12:18 PM Christian Reiss <email@christian-reiss.de <mailto:email@christian-reiss.de>> wrote:
Hey folks,
Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of
On 06/02/2020 18:31, Jayme wrote: the
vms is backup'ed seperatly using bareos on an hourly basis, so files are present with worst case 59 minutes data loss.
Now, on the outside I thought of doing gluster snapshots and then syncing those .snap dirs away to a remote 10gig connected machine on
a
weekly-or-so basis. As those contents of the snaps are the oVirt
images
(entire DC) I could re-setup gluster and copy those files back into gluster and be done with it.
Now some questions, if I may:
- If the hosts remain intact but gluster dies, I simply setup Gluster, stop the ovirt engine (seperate standalone hardware) copy everything back and start ovirt engine again. All disks are accessible again (tested). The bricks are marked as down (new bricks, same name).
There
is a "reset brick" button that made the bricks come back online
again.
What _exactly_ does it do? Does it reset the brick info in oVirt or copy all the data over from another node and really, really reset the
brick?
- If the hosts remain intact, but the engine dies: Can I re-attach
the
engine the the running cluster?
- If hosts and engine dies and everything needs to be re-setup would
it
be possible to do the setup wizard(s) again up to a running point
then
copy the disk images to the new gluster-dc-data-dir? Would oVirt
rescan
the dir for newly found vms?
- If _one_ host dies, but 2 and the engine remain online: Whats the oVirt way of resetting up the failed one? Reinstalling the node and then what? From all the cases above this is the most likely one.
Having had to reinstall the entire Cluster three times already scares me. Always gluster related.
Again thank you community for your great efforts!
-- with kind regards, mit freundlichen Gruessen,
Christian Reiss _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/A4IVGFXXYQI4GS...
-- with kind regards, mit freundlichen Gruessen,
Christian Reiss

My system was running well until I tried to upgrade to 4.3.8 - Gluster and Engine died. From my perspective now it does seem that content backup to NFS shares over 10GBe is a must as Gluster is fine until it isn’t and when it isn’t you can lose everything. I will implement the solution below. My current separate KVM setup has been rock solid
On 6 Feb 2020, at 18:07, Christian Reiss <email@christian-reiss.de> wrote:
Hey Jamie,
thanks for replying. I was wondering about gluster g-rep, but what if something that just happened to me (gluster f*ckup) will get replicated too. At this point (lost 3 HCI clusters due to Gluster) I am not really trusting this piece of software with my live data *and* my backups.
I am really protecting myself against Gluster than anything else. So for backup purposes: The less Gluster, the better.
-Chris.
On 06/02/2020 18:31, Jayme wrote:
You should look at the gluster georeplication option, I think it would be more appropriate for disaster recovery purposes. It is also possible to export VMs as OVA which can then be reimported back into oVirt. I actually just wrote an ansible playbook to do this very thing and intend to share my finding and playbooks with the ovirt community hopefully this week. On Thu, Feb 6, 2020 at 12:18 PM Christian Reiss <email@christian-reiss.de <mailto:email@christian-reiss.de>> wrote: Hey folks, Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of the vms is backup'ed seperatly using bareos on an hourly basis, so files are present with worst case 59 minutes data loss. Now, on the outside I thought of doing gluster snapshots and then syncing those .snap dirs away to a remote 10gig connected machine on a weekly-or-so basis. As those contents of the snaps are the oVirt images (entire DC) I could re-setup gluster and copy those files back into gluster and be done with it. Now some questions, if I may: - If the hosts remain intact but gluster dies, I simply setup Gluster, stop the ovirt engine (seperate standalone hardware) copy everything back and start ovirt engine again. All disks are accessible again (tested). The bricks are marked as down (new bricks, same name). There is a "reset brick" button that made the bricks come back online again. What _exactly_ does it do? Does it reset the brick info in oVirt or copy all the data over from another node and really, really reset the brick? - If the hosts remain intact, but the engine dies: Can I re-attach the engine the the running cluster? - If hosts and engine dies and everything needs to be re-setup would it be possible to do the setup wizard(s) again up to a running point then copy the disk images to the new gluster-dc-data-dir? Would oVirt rescan the dir for newly found vms? - If _one_ host dies, but 2 and the engine remain online: Whats the oVirt way of resetting up the failed one? Reinstalling the node and then what? From all the cases above this is the most likely one. Having had to reinstall the entire Cluster three times already scares me. Always gluster related. Again thank you community for your great efforts! -- with kind regards, mit freundlichen Gruessen, Christian Reiss _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A4IVGFXXYQI4GS...
-- with kind regards, mit freundlichen Gruessen,
Christian Reiss _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TRGCZ2PVUUFBOT...

On February 6, 2020 6:06:18 PM GMT+02:00, Christian Reiss <email@christian-reiss.de> wrote:
Hey folks,
Running a 3-way HCI (again (sigh)) on gluster. Now the _inside_ of the vms is backup'ed seperatly using bareos on an hourly basis, so files are present with worst case 59 minutes data loss.
Now, on the outside I thought of doing gluster snapshots and then syncing those .snap dirs away to a remote 10gig connected machine on a weekly-or-so basis. As those contents of the snaps are the oVirt images
(entire DC) I could re-setup gluster and copy those files back into gluster and be done with it.
Now some questions, if I may:
- If the hosts remain intact but gluster dies, I simply setup Gluster, stop the ovirt engine (seperate standalone hardware) copy everything back and start ovirt engine again. All disks are accessible again (tested). The bricks are marked as down (new bricks, same name). There is a "reset brick" button that made the bricks come back online again. What _exactly_ does it do? Does it reset the brick info in oVirt or copy all the data over from another node and really, really reset the brick?
- If the hosts remain intact, but the engine dies: Can I re-attach the engine the the running cluster?
- If hosts and engine dies and everything needs to be re-setup would it
be possible to do the setup wizard(s) again up to a running point then copy the disk images to the new gluster-dc-data-dir? Would oVirt rescan
the dir for newly found vms?
- If _one_ host dies, but 2 and the engine remain online: Whats the oVirt way of resetting up the failed one? Reinstalling the node and then what? From all the cases above this is the most likely one.
Having had to reinstall the entire Cluster three times already scares me. Always gluster related.
Again thank you community for your great efforts!
Gluster reset brick actually wipes the brick and starts a heal process from another brick. If your node dies, ovirt won't allow you to remove it from untill you restore the 'replica 3' status of gluster. I think that the fastest way to restore a node is: 1. Reinstall the node with same hostname and network settings 2. Restore from backup gluster directory /var/lib/glusterd/ 3. Restart the node and initiate a reaet brick. 4. Go to UI and remove the node that was defective 5. Add again the node Voila. About the gluster issues - you are not testing enough your upgrades and if you use the cluster in production, it will be quite disruptive. For example, the ACL issue I had met (and actually you too) was discussed in the mailing list for 2 weeks before I have managed to resolve it. I'm using latest oVirt with Gluster v7 - but this is my lab and I can afford downtime of a week (or even more). The more tested is an oVirt/Gluster release - the more reliable it will be. Best Regards, Strahil Nikolov
participants (4)
-
Christian Reiss
-
Jayme
-
Rob
-
Strahil Nikolov