Active-Passive DR: mutual for different storage domains possible?

Hello, suppose I want to implement Active-Passive DR between 2 sites. Sites are SiteA and SiteB. I have 2 storage domains SD1 and SD2, that I can configure so that SD1 is active in storage array installed in SiteA with replica in SiteB and SD2 the reverse. I have 4 hosts: host1 and host2 in SiteA and host3 and host4 in SiteB. I would like to optimize compute resources and workload so that: oVirt env OV1 with ovmgr1 in SiteA (external engine) is composed by host1 and host2 and configured with SD1 oVirt env OV2 with ovmgr2 in SiteB (external engine) is composed by host3 and host4 and configured with SD2. Can I use OV2 as DR for OV1 for VMs installed on SD1 and at the same time OV1 as DR for OV2 for VMs installed on SD2? Thanks in advance, Gianluca Cecchi

I don't see any reason not to do it in case the SD replicas are separated storage domain. Just note that for the DR, you should prepare a separated DC with a cluster. P.S - I most to admit that I didn't try this configuration - please share your results. On Mon, Jul 8, 2019 at 10:45 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, suppose I want to implement Active-Passive DR between 2 sites. Sites are SiteA and SiteB. I have 2 storage domains SD1 and SD2, that I can configure so that SD1 is active in storage array installed in SiteA with replica in SiteB and SD2 the reverse. I have 4 hosts: host1 and host2 in SiteA and host3 and host4 in SiteB.
I would like to optimize compute resources and workload so that:
oVirt env OV1 with ovmgr1 in SiteA (external engine) is composed by host1 and host2 and configured with SD1
oVirt env OV2 with ovmgr2 in SiteB (external engine) is composed by host3 and host4 and configured with SD2.
Can I use OV2 as DR for OV1 for VMs installed on SD1 and at the same time OV1 as DR for OV2 for VMs installed on SD2?
Thanks in advance, Gianluca Cecchi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GHLEJ7ZXEY5TC6...
-- Regards, Eyal Shenitzky

On Mon, Jul 8, 2019 at 10:39 AM Eyal Shenitzky <eshenitz@redhat.com> wrote:
I don't see any reason not to do it in case the SD replicas are separated storage domain. Just note that for the DR, you should prepare a separated DC with a cluster.
P.S - I most to admit that I didn't try this configuration - please share your results.
Thanks for your insights Eyal. I'm going ahead with the tests. One question arose after creating disaster_recovery_maps.yml and the need to populate all the "secondary_xxx" variable mappings.
In my scenario the primary DC DC1 in Site A has the same network configuration of the primary DC DC2 in Site B. In fact the main target is to reach better utilization of available resources and so potentially VMs in DC1 communicates with VMs in DC2 in normal conditions. Now to configure DR I have to create a mapping of DC1 in Site B: if I want to leverage hosts' resources in Site B I'm forced to set it to DC2, correct? That is the current primary for its storage domain SD2, otherwise I will have no hosts to assign to the cluster inside it... what is the risk of overlapping of objects in this case (supposing I personally take care to not have Vms in DC1 with same name of VMs in DC2, and the same for storage domains' names)? I could have an object, such a disk id that during import would overlap with existing objects n the database? Or will the engine re-create new ids (for vnics, disks, ecc.) while importing them? Other scenario could be to create inside Site B environment another Datacenter with name DC1-DR, and I think I have to create also the same logical networks of DC1 (and DC2 incidentally) and in case of DR I have to take off one of the hosts of DC2 and assign it to DC1-DR.... Opinions? Thanks in advance, Gianluca

On Tue, Jul 23, 2019 at 6:11 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Jul 8, 2019 at 10:39 AM Eyal Shenitzky <eshenitz@redhat.com> wrote:
I don't see any reason not to do it in case the SD replicas are separated storage domain. Just note that for the DR, you should prepare a separated DC with a cluster.
P.S - I most to admit that I didn't try this configuration - please share your results.
Thanks for your insights Eyal. I'm going ahead with the tests. One question arose after creating disaster_recovery_maps.yml and the need to populate all the "secondary_xxx" variable mappings.
In my scenario the primary DC DC1 in Site A has the same network configuration of the primary DC DC2 in Site B. In fact the main target is to reach better utilization of available resources and so potentially VMs in DC1 communicates with VMs in DC2 in normal conditions. Now to configure DR I have to create a mapping of DC1 in Site B: if I want to leverage hosts' resources in Site B I'm forced to set it to DC2, correct?
You can use the following manual to understand what is required for the DR process - https://ovirt.org/documentation/disaster-recovery-guide/active_passive_overv... You need the following entities in the secondary site: - An active Red Hat Virtualization Manager. - A data center and clusters. - Networks with the same general connectivity as the primary site. - Active hosts capable of running critical virtual machines after failover. It means you should have at least a dedicated host to perform all the operations on the secondary site (if you have many running VMs you will need more than one host in order to provide a full backup solution).
That is the current primary for its storage domain SD2, otherwise I will have no hosts to assign to the cluster inside it... what is the risk of overlapping of objects in this case (supposing I personally take care to not have Vms in DC1 with same name of VMs in DC2, and the same for storage domains' names)? I could have an object, such a disk id that during import would overlap with existing objects n the database? Or will the engine re-create new ids (for vnics, disks, ecc.) while importing them?
The IDs of the entities remains the same as they were in the primary site. It means that if you are using a site that contains entities and runs operations during the DR process you are risking duplications of names and in low probabilities of duplicated IDs. Also, the host may not be available to handle the DR and operation may be failed.
Other scenario could be to create inside Site B environment another Datacenter with name DC1-DR, and I think I have to create also the same logical networks of DC1 (and DC2 incidentally) and in case of DR I have to take off one of the hosts of DC2 and assign it to DC1-DR....
This option is the best option for DR. An isolated Data-center that is dedicated to a DR scenario. It is a trade-off - resources VS robustness
Opinions?
Thanks in advance, Gianluca
-- Regards, Eyal Shenitzky

On Thu, Jul 25, 2019 at 7:37 AM Eyal Shenitzky <eshenitz@redhat.com> wrote:
You can use the following manual to understand what is required for the DR process -
https://ovirt.org/documentation/disaster-recovery-guide/active_passive_overv...
Thanks Eyal. I was already using it as a reference, but I had an older pdf
version, so I downloaded the new one (updated 19/06, while the one I was using was of mid May; probably 4.3.3 vs 4.3.4...)
You need the following entities in the secondary site:
[snip] OK
It means you should have at least a dedicated host to perform all the operations on the secondary site (if you have many running VMs you will need more than one host in order to provide a full backup solution).
OK
The IDs of the entities remains the same as they were in the primary site. It means that if you are using a site that contains entities and runs operations during the DR process you are risking duplications of names and in low probabilities of duplicated IDs.
Also, the host may not be available to handle the DR and operation may be failed.
Got it
Other scenario could be to create inside Site B environment another Datacenter with name DC1-DR, and I think I have to create also the same logical networks of DC1 (and DC2 incidentally) and in case of DR I have to take off one of the hosts of DC2 and assign it to DC1-DR....
This option is the best option for DR. An isolated Data-center that is dedicated to a DR scenario. It is a trade-off - resources VS robustness
Thanks for confirmation. The first test I'm going to try is the "discreeet failover test". I notice two things inside the guide (both the old and the updated one): a) In section B.1 it contains the point 2. Run the command to fail over to the secondary site: # ansible-playbook playbook --tags "fail_over" For more information, see Section 3.5, “Execute a Failback” . I think it is an error and should contain instead: "For more information, see Section 3.3, “Execute a Failover” ." Correct? In case I can open a documentation bug, let me know b) it is not clear what can differentiate a "failover" action from a "discreet failover test" action In both descriptions (3.3 and B.1) in fact you run # ansible-playbook playbook --tags "fail_over" where playbook name is the "dr-rhv-failover.yml" example created as described in previous section: 3.2.3. Create the Failover and Failback Playbooks Is this correct/expected? And only the notice to take care to isolate resources in B.1 to not collide with primary ones that are still active? One final note/doubt: c) in section 3.3 "Execute a Failover" there is the sentence " IMPORTANT Sanlock must release all storage locks from the replicated storage domains before the failover process starts. These locks should be released automatically approximately 80 seconds after the disaster occurs. " Can you elaborate on this? Does it mean that on the replicated lun I will find the locks generated by sanlock (on primary hosts when active) and so when the replicated lun is going to be connected to the DR environment, the sanlock daemon of the DR host/s, that has started at boot right after wdmd, will take care of those locks and remove them? What about the "approximately 80 seconds"? btw: is there a "master" function too regarding the various sanlock daemons running on the available hosts? Thanks, Gianluca

On Thu, Jul 25, 2019 at 1:50 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Thu, Jul 25, 2019 at 7:37 AM Eyal Shenitzky <eshenitz@redhat.com> wrote:
You can use the following manual to understand what is required for the DR process -
https://ovirt.org/documentation/disaster-recovery-guide/active_passive_overv...
Thanks Eyal. I was already using it as a reference, but I had an older
pdf version, so I downloaded the new one (updated 19/06, while the one I was using was of mid May; probably 4.3.3 vs 4.3.4...)
You need the following entities in the secondary site:
[snip] OK
It means you should have at least a dedicated host to perform all the operations on the secondary site (if you have many running VMs you will need more than one host in order to provide a full backup solution).
OK
The IDs of the entities remains the same as they were in the primary site. It means that if you are using a site that contains entities and runs operations during the DR process you are risking duplications of names and in low probabilities of duplicated IDs.
Also, the host may not be available to handle the DR and operation may be failed.
Got it
Other scenario could be to create inside Site B environment another Datacenter with name DC1-DR, and I think I have to create also the same logical networks of DC1 (and DC2 incidentally) and in case of DR I have to take off one of the hosts of DC2 and assign it to DC1-DR....
This option is the best option for DR. An isolated Data-center that is dedicated to a DR scenario. It is a trade-off - resources VS robustness
Thanks for confirmation. The first test I'm going to try is the "discreeet failover test". I notice two things inside the guide (both the old and the updated one): a) In section B.1 it contains the point
2. Run the command to fail over to the secondary site: # ansible-playbook playbook --tags "fail_over" For more information, see Section 3.5, “Execute a Failback” .
I think it is an error and should contain instead: "For more information, see Section 3.3, “Execute a Failover” ." Correct? In case I can open a documentation bug, let me know
b) it is not clear what can differentiate a "failover" action from a "discreet failover test" action In both descriptions (3.3 and B.1) in fact you run # ansible-playbook playbook --tags "fail_over" where playbook name is the "dr-rhv-failover.yml" example created as described in previous section: 3.2.3. Create the Failover and Failback Playbooks
Is this correct/expected? And only the notice to take care to isolate resources in B.1 to not collide with primary ones that are still active?
One final note/doubt: c) in section 3.3 "Execute a Failover" there is the sentence
Please notice that a automation python scripts created in order to facilitate the DR process. You can find them under - path/to/your/dr/folder/files. You can use those scripts to generate the mapping, test the generated mapping and start the failover/failback. I strongly recommend to use it.
" IMPORTANT Sanlock must release all storage locks from the replicated storage domains before the failover process starts. These locks should be released automatically approximately 80 seconds after the disaster occurs. " Can you elaborate on this? Does it mean that on the replicated lun I will find the locks generated by sanlock (on primary hosts when active) and so when the replicated lun is going to be connected to the DR environment, the sanlock daemon of the DR host/s, that has started at boot right after wdmd, will take care of those locks and remove them? What about the "approximately 80 seconds"?
btw: is there a "master" function too regarding the various sanlock daemons running on the available hosts?
I am not sure what you mean about the "master" function but sanlock will release its locks after approximately 80 seconds. It means that in operating regular environment, sanlock monitor the domains (acquire and release its leases), in case of disaster, it will take to sanlock approximately 80 seconds expire those acquired leases. Because your domains are replicated, the sanlock leases also replicated on that domain and we should wait for them tooexpired.
Thanks, Gianluca
-- Regards, Eyal Shenitzky
participants (2)
-
Eyal Shenitzky
-
Gianluca Cecchi