On Thu, Jul 25, 2019 at 1:50 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

On Thu, Jul 25, 2019 at 7:37 AM Eyal Shenitzky <eshenitz@redhat.com> wrote:

You can use the following manual to understand what is required for the DR process -
https://ovirt.org/documentation/disaster-recovery-guide/active_passive_overview.html

Thanks Eyal. I was already using it as a reference, but I had an older pdf version, so I downloaded the new one (updated 19/06, while the one I was using was of mid May; probably 4.3.3 vs 4.3.4...)

You need the following entities in the secondary site:

[snip]
OK

It means you should have at least a dedicated host to perform all the operations on the secondary site (if you have many running VMs you will need more than one host in order to provide a full backup solution).
OK

The IDs of the entities remains the same as they were in the primary site.
It means that if you are using a site that contains entities and runs operations during the DR process you are risking duplications of names and in low probabilities of duplicated IDs.

Also, the host may not be available to handle the DR and operation may be failed.

Got it

Other scenario could be to create inside Site B environment another Datacenter with name DC1-DR, and I think I have to create also the same logical networks of DC1 (and DC2 incidentally) and in case of DR I have to take off one of the hosts of DC2 and assign it to DC1-DR....

This option is the best option for DR.
An isolated Data-center that is dedicated to a DR scenario.
It is a trade-off - resources VS robustness

Thanks for confirmation.
The first test I'm going to try is the "discreeet failover test".
I notice two things inside the guide (both the old and the updated one):
a) In section B.1 it contains the point

2. Run the command to fail over to the secondary site:
# ansible-playbook playbook --tags "fail_over"
For more information, see Section 3.5, “Execute a Failback” .

I think it is an error and should contain instead:
"For more information, see Section 3.3, “Execute a Failover” ."
Correct? In case I can open a documentation bug, let me know

b) it is not clear what can differentiate a "failover" action from a "discreet failover test" action
In both descriptions (3.3 and B.1) in fact you run
# ansible-playbook playbook --tags "fail_over"
where playbook name is the "dr-rhv-failover.yml" example created as described in previous section:
3.2.3. Create the Failover and Failback Playbooks

Is this correct/expected? And only the notice to take care to isolate resources in B.1 to not collide with primary ones that are still active?

One final note/doubt:
c) in section 3.3 "Execute a Failover" there is the sentence

Please notice that a automation python scripts created in order to facilitate the DR process.

You can find them under - path/to/your/dr/folder/files.

You can use those scripts to generate the mapping, test the generated mapping and start the failover/failback.

I strongly recommend to use it.

"
IMPORTANT
Sanlock must release all storage locks from the replicated storage domains before the
failover process starts. These locks should be released automatically approximately 80
seconds after the disaster occurs.
"
Can you elaborate on this? Does it mean that on the replicated lun I will find the locks generated by sanlock (on primary hosts when active) and so when the replicated lun is going to be connected to the DR environment, the sanlock daemon of the DR host/s, that has started at boot right after wdmd, will take care of those locks and remove them? What about the "approximately 80 seconds"?

btw: is there a "master" function too regarding the various sanlock daemons running on the available hosts?

I am not sure what you mean about the "master" function but sanlock will release its locks after approximately 80 seconds.

It means that in operating regular environment, sanlock monitor the domains (acquire and release its leases), in case of disaster, it will take to sanlock approximately 80 seconds expire those acquired leases.

Because your domains are replicated, the sanlock leases also replicated on that domain and we should wait for them tooexpired.

Thanks,
Gianluca

Regards,

Eyal Shenitzky