
Hello users, After a painful experience with a crashed engine and a problematic recovery from a backup, I was thinking to create one more engine in standby mode in the same network, that will automatically recover backup files from the active engine on a daily/hourly basis. This way I'll run recovery tests every day automatically and easily switch in case of engine failure with a DNS record change to the passive engine. So the main question, is it harmful to have more then one engine alive if only one is been modified? can the standby engine affect the host nodes?

On Thu, Jan 10, 2019 at 10:56 PM maoz zadok <maozza@gmail.com> wrote:
Hello users, After a painful experience with a crashed engine and a problematic recovery from a backup, I was thinking to create one more engine in standby mode in the same network, that will automatically recover backup files from the active engine on a daily/hourly basis. This way I'll run recovery tests every day automatically and easily switch in case of engine failure with a DNS record change to the passive engine.
Makes sense. I think I heard about people doing this, do not have details.
So the main question, is it harmful to have more then one engine alive if only one is been modified?
It's harmful even if they are identical, because they do not know about each other. Each thinks it's the only one controlling the hosts, and they might give hosts conflicting commands to do.
can the standby engine affect the host nodes?
Yes, that's the main problem. If you go this way, I suggest to not let the backup server access your hosts normally, and think about some other way to test it. When you need to activate it, open access to the hosts (and change DNS). Obviously, this requires very careful planning, implementation and on-going operation. Most people that need HA engine use ovirt-hosted-engine, and some others use a single engine vm managed by some other means (e.g. some other virt solution, another instance of oVirt, or plain kvm and libvirt plus some custom HA stuff). Best regards, -- Didi

13.01.2019 10:47, Yedidyah Bar David пишет:
Most people that need HA engine use ovirt-hosted-engine,
HA hosted-engine cannot help if VM image are broken. HA runs same image on different nodes, and if current running VM corrupt FS, for example, it cannot run on other nodes also. I wrote about this experience here some time ago. -- Mike

In that case, my understanding is that you use the backup feature within hosted-engine, and if you need to recover use that backup, which is an option during hosted-engine installation. I’m all new to this, and I know I haven’t read all the administration documentation (though what I have read is very good). It might be helpful, if it doesn’t exist, to have a separate “disaster recovery” practices page to cover all the terrible things that could happen to a running system. The first part would be activities that administrators should be doing regularly while the system is healthy, then a section on testing the validity of those backups. Finally, multiple subsections on “if this happens, then use this, this, and this that you have for section A to recover your system.” I’d be willing to help create that, since I need to create the documents for our business systems. Tom Albrecht III Cyber Architect Lockheed Martin RMS Sent from my iPhone
On Jan 13, 2019, at 8:34 AM, Mike <combr@ya.ru> wrote:
13.01.2019 10:47, Yedidyah Bar David пишет:
Most people that need HA engine use ovirt-hosted-engine,
HA hosted-engine cannot help if VM image are broken. HA runs same image on different nodes, and if current running VM corrupt FS, for example, it cannot run on other nodes also.
I wrote about this experience here some time ago.
-- Mike _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WK4IT3NMVOWGIC...

is it good enough to disable the ovirt-engine "systemctl disable ovirt-engine" on the standby node? if it does, what about the other services, do I have to disable the following as well? : *ovirt-engine-dwhd.service enabled * *ovirt-engine.service enabled * *ovirt-fence-kdump-listener.service enabled * *ovirt-imageio-proxy.service enabled * *ovirt-provider-ovn.service enabled * *ovirt-vmconsole-proxy-sshd.service enabled * *ovirt-websocket-proxy.service enabled * *Thank you! * *maoz* On Sun, Jan 13, 2019 at 3:34 PM Mike <combr@ya.ru> wrote:
13.01.2019 10:47, Yedidyah Bar David пишет:
Most people that need HA engine use ovirt-hosted-engine,
HA hosted-engine cannot help if VM image are broken. HA runs same image on different nodes, and if current running VM corrupt FS, for example, it cannot run on other nodes also.
I wrote about this experience here some time ago.
-- Mike _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WK4IT3NMVOWGIC...

On Sun, Jan 13, 2019 at 9:40 PM <alex@triadic.us> wrote:
For what it's worth we do active passive with pacemaker, corosync, and drbd. all the configuration files stay synced by drbd, pacemaker ensures the services are only running on one node. Works pretty well.
For more demanding scenarios there is also an ansible role to handle the disaster recovery of an oVirt environment: https://github.com/oVirt/ovirt-ansible-disaster-recovery Here a demo about how to handle the mapping between the two sites: https://youtu.be/mEOgH-Tk09c
On Jan 13, 2019 2:33 PM, maoz zadok <maozza@gmail.com> wrote:
is it good enough to disable the ovirt-engine "systemctl disable ovirt-engine" on the standby node? if it does, what about the other services, do I have to disable the following as well? : *ovirt-engine-dwhd.service enabled * *ovirt-engine.service enabled * *ovirt-fence-kdump-listener.service enabled *
*ovirt-imageio-proxy.service enabled * *ovirt-provider-ovn.service enabled * *ovirt-vmconsole-proxy-sshd.service enabled * *ovirt-websocket-proxy.service enabled *
*Thank you! * *maoz*
On Sun, Jan 13, 2019 at 3:34 PM Mike <combr@ya.ru> wrote:
13.01.2019 10:47, Yedidyah Bar David пишет:
Most people that need HA engine use ovirt-hosted-engine,
HA hosted-engine cannot help if VM image are broken. HA runs same image on different nodes, and if current running VM corrupt FS, for example, it cannot run on other nodes also.
I wrote about this experience here some time ago.
-- Mike _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WK4IT3NMVOWGIC...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CAXQSP3DG5XZX5...

I'm still sort of new to ovirt, but I went through a similar things. I had my original engine fail and had to recover, so here is my "oVirt HA plan" 1. I do NOT use hosted ovirt, I had issues getting it deployed correctly, and that doesn't help if they engine VM itself has issues. My engine is hosted on a completely separated 2-node hyper-converged Hyper-V Cluster. Unless you have a cluster larger than 3 hosts, I really don't think hosted ovirt is a good idea, it would make more sense to just load ovirt on another PC by itself. 2. I plan on loading another copy of ovirt in a "cold storage" configuration. Where it will be loaded on centos, and configured as close as I can without adding in any hosts. I'll probably keep it turned off and try to updated it once a month or so. 3. If I have another oVirt failure, I know to log in, put the storage domains into maintenance mode if possible, and copy out any needed config items i may have missed. I will the shut it off and delete it. I then will reboot all the hosts one at a time to clear out any sanlock issues, then add the hosts to the new copy of ovirt and import the storage. I estimate that process to take around 2-4hrs.

Well, I really love oVirt, but I don't know.. All the solutions mentioned here are complicated and or dangerous. Including hosted ha engine that fails while deploying(for me). I think that test the backup for recovery is very important and need to be done on a regular basis, What good is a backup if you cannot restore?? I worked for a whole night trying to recover the failed engine...recovery from backup was very painful. does anyone here have a solution for testing backups (not in crises mode)? On Mon, Jan 14, 2019, 20:41 <michael@wanderingmad.com wrote:
I'm still sort of new to ovirt, but I went through a similar things. I had my original engine fail and had to recover, so here is my "oVirt HA plan"
1. I do NOT use hosted ovirt, I had issues getting it deployed correctly, and that doesn't help if they engine VM itself has issues. My engine is hosted on a completely separated 2-node hyper-converged Hyper-V Cluster. Unless you have a cluster larger than 3 hosts, I really don't think hosted ovirt is a good idea, it would make more sense to just load ovirt on another PC by itself. 2. I plan on loading another copy of ovirt in a "cold storage" configuration. Where it will be loaded on centos, and configured as close as I can without adding in any hosts. I'll probably keep it turned off and try to updated it once a month or so. 3. If I have another oVirt failure, I know to log in, put the storage domains into maintenance mode if possible, and copy out any needed config items i may have missed. I will the shut it off and delete it. I then will reboot all the hosts one at a time to clear out any sanlock issues, then add the hosts to the new copy of ovirt and import the storage. I estimate that process to take around 2-4hrs. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AHD3OABM65FWYM...

real HA is complicated, no way around that... As stated earlier, we also run engine bare metal using pacemaker / corosync / drbd to keep both nodes in perfect sync, failover happens in a few seconds. We also do daily backups of the engine, but in the 4 years or so that we have been running ovirt, we have luckily never had to use them with this setup. STONITH is pretty important to setup if you are running less than three nodes as the engine, just to keep split brain from corrupting everything. On 2019-01-14 14:16, maoz zadok wrote:
Well, I really love oVirt, but I don't know.. All the solutions mentioned here are complicated and or dangerous. Including hosted ha engine that fails while deploying(for me). I think that test the backup for recovery is very important and need to be done on a regular basis, What good is a backup if you cannot restore?? I worked for a whole night trying to recover the failed engine...recovery from backup was very painful. does anyone here have a solution for testing backups (not in crises mode)?
On Mon, Jan 14, 2019, 20:41 <michael@wanderingmad.com wrote:
I'm still sort of new to ovirt, but I went through a similar things. I had my original engine fail and had to recover, so here is my "oVirt HA plan"
1. I do NOT use hosted ovirt, I had issues getting it deployed correctly, and that doesn't help if they engine VM itself has issues. My engine is hosted on a completely separated 2-node hyper-converged Hyper-V Cluster. Unless you have a cluster larger than 3 hosts, I really don't think hosted ovirt is a good idea, it would make more sense to just load ovirt on another PC by itself. 2. I plan on loading another copy of ovirt in a "cold storage" configuration. Where it will be loaded on centos, and configured as close as I can without adding in any hosts. I'll probably keep it turned off and try to updated it once a month or so. 3. If I have another oVirt failure, I know to log in, put the storage domains into maintenance mode if possible, and copy out any needed config items i may have missed. I will the shut it off and delete it. I then will reboot all the hosts one at a time to clear out any sanlock issues, then add the hosts to the new copy of ovirt and import the storage. I estimate that process to take around 2-4hrs. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AHD3OABM65FWYM...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6XC6QPSLBEICJG...

Hi Alex, could you share more details on your setup ? I also had issues with restoring the engine from a backup and it made me think of running engine not a self hosted HA. I don’t have any experience with pacemaker/corrosync and drbd, but I’m willing to test and find out how to use these for the engine. I think details on your config would provide a good starting point. Of course a working backup of any solution is necessary. Thank you, Sven Von: Alex McWhirter [mailto:alex@triadic.us] Gesendet: Dienstag, 15. Januar 2019 02:19 An: maoz zadok <maozza@gmail.com> Cc: michael@wanderingmad.com; users <users@ovirt.org> Betreff: [ovirt-users] Re: multiple engines (active passive) real HA is complicated, no way around that... As stated earlier, we also run engine bare metal using pacemaker / corosync / drbd to keep both nodes in perfect sync, failover happens in a few seconds. We also do daily backups of the engine, but in the 4 years or so that we have been running ovirt, we have luckily never had to use them with this setup. STONITH is pretty important to setup if you are running less than three nodes as the engine, just to keep split brain from corrupting everything. On 2019-01-14 14:16, maoz zadok wrote: Well, I really love oVirt, but I don't know.. All the solutions mentioned here are complicated and or dangerous. Including hosted ha engine that fails while deploying(for me). I think that test the backup for recovery is very important and need to be done on a regular basis, What good is a backup if you cannot restore?? I worked for a whole night trying to recover the failed engine...recovery from backup was very painful. does anyone here have a solution for testing backups (not in crises mode)? On Mon, Jan 14, 2019, 20:41 <michael@wanderingmad.com<mailto:michael@wanderingmad.com> wrote: I'm still sort of new to ovirt, but I went through a similar things. I had my original engine fail and had to recover, so here is my "oVirt HA plan" 1. I do NOT use hosted ovirt, I had issues getting it deployed correctly, and that doesn't help if they engine VM itself has issues. My engine is hosted on a completely separated 2-node hyper-converged Hyper-V Cluster. Unless you have a cluster larger than 3 hosts, I really don't think hosted ovirt is a good idea, it would make more sense to just load ovirt on another PC by itself. 2. I plan on loading another copy of ovirt in a "cold storage" configuration. Where it will be loaded on centos, and configured as close as I can without adding in any hosts. I'll probably keep it turned off and try to updated it once a month or so. 3. If I have another oVirt failure, I know to log in, put the storage domains into maintenance mode if possible, and copy out any needed config items i may have missed. I will the shut it off and delete it. I then will reboot all the hosts one at a time to clear out any sanlock issues, then add the hosts to the new copy of ovirt and import the storage. I estimate that process to take around 2-4hrs. _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AHD3OABM65FWYM... _______________________________________________ Users mailing list -- users@ovirt.org<mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org<mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6XC6QPSLBEICJG...
participants (9)
-
Albrecht, Thomas C
-
Alex McWhirter
-
alex@triadic.us
-
maoz zadok
-
michael@wanderingmad.com
-
Mike
-
Simone Tiraboschi
-
Sven Achtelik
-
Yedidyah Bar David