[ovirt-users] Need advise/help/ideas to implement a two sites disaster-recovery with ovirt

Tue Nov 21 18:42:42 UTC 2017

On Tue, Nov 21, 2017 at 2:40 PM, wodel youchi <wodel.youchi at gmail.com> wrote:
>
> Hi,
>
> We want to implement two sites with ovirt and have disaster-recovery, if site one is unreachable, then site two takes-over.
>
> We will be using iSCSI for Data domains
>
> Our initial idea is :
> - install the first site :
>    - self-hosted engine
>    - iscsi blocks for data domains
>
> - install the second site:
>    - self-hosted engine.
>    - iscsi blocks for data domain.
>
> Use the snapshot synchronization of the disk-array to sync iscsi blocks/volumes.
>
> The two self-hosted engine will be independent.
> In case of a failure :
>  - break the synchronization
>  - Access the data domain in site two
>  - start the VMs.
>
>
> oVirt does not manage replication of VMs itself, someone may argue that it's not it's purpose, but what about selecting the VMs to replicate?? in case of NFS we may (I don't know really if it's possible) have some access to the VMs disk files, but in block mode with LVM in it, things become complicated.
>
> So if someone has implemented such architecture, could you please share the experience, give advice and ideas on how to best implement this.

Hello,

what you're saying is exactly what we designed and implemented in a
test env, in few months we'll move to production.

We have two EMC VNX arrays supporting our storage, one in our main
site, one in our DR site. We're using FC, not iSCSI, but the idea
behind is the same.

We have this setup in the main site:

- hosted-engine
- one storage domain not replicated (needed for completing the setup)
- some storage domains replicated via storage replication (EMC
MirrorView). From the ovirt point of view are standard storage
domains.
- some networks, let's call A, B, C.

On DR side:
- hosted-engine
- one storage domain not replicated (needed for completing the setup)
- Exactly the same networks configured with the same name

Data are replicated with a RPO of 1 Hour, so we need that OVFStore on
each storage domain is up to date.
We call the API for updating the ovf_store of each replicated SD (as
Yaniv suggested in
https://www.mail-archive.com/users@ovirt.org/msg43896.html ):

POST api/storagedomains/{sd_id}/updateovfstore

In this way not only the VMS disks are replicated, but also VMS
hardware informations.

In case of disaster what's happening?

On the recovery side we set the disks on VNX Storage as R/W accessible
(Promote to Primary Image). Then disks are made available to the
hosts, that can the use the "Import Storage" to attach them.
You'll be warned that SD are marked as up in another datacenter, but
you can clearly ignore that message because the "other" datacenter is
now offline due to the disaster.
Once attached the SD, you set them as available (SD are imported in
maintenance mode). Once Available, go to each SD, select the tab "VM
Import" and you can so import any VM that are inside them.
If you created all the required networks, with the same name, vms will
be already connected to the network so powering on is the last action
you require.

I did it manually and works. I'm now waiting to have all the necessary
support from my colleagues to start developing a full DR Solution to
replicate this behavior though python.

Let me know if you need any clarification,

Luca

-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <lorenzetto.luca at gmail.com>