Gianluca,

Thank you so much for the great feedback, it is very much appreciated! I too have to carve out some time to test some of these ideas more thoroughly, but I wanted to offer some of my initial thoughts anyway.

My goal is for the playbook(s) to be as simple as possible with as little configuration as possible. Ideally I'd love to see the playbooks able to be run from any host without requiring a connection to the engine database or needing to have access to storage in order to verify export status.

1. Vault: I am aware of this and have seen this method used in other oVirt/RHEV documentation. The reason I left it out is because I want to run the playbook on cron without being prompted for a password. This could potentially be solved by specifying the vault password as an environment variable in cron, but in the end the password still needs to be provided somewhere for the playbook to work hands-off. I suppose it's a matter of which is the most secure and recommended way to do so.  Open to suggestions here.

2. Blocks: I am aware of the use of blocks in Ansible but don't personally have much direct experience using them. Your idea to use a block for SSO token seems reasonable and likely should be implemented. I need to test that out.

3. Export Timing: I like your solution for probing the DB for export status and I'd like to spend some more time looking at that. I wonder if it's perhaps a bit too complex and if there may be an easier way without directly interacting with the engine database. One idea I had which I think could work would be by use of the https://docs.ansible.com/ansible/latest/modules/ovirt_event_info_module.html#ovirt-event-info-module module. I believe this module could be used in a wait_for until the message "Vm X was exported successfully as a Virtual Appliance to path..." appears in the VM's event messages. To make sure we don't get prior events we could register the current event index ID in a variable then use the "from_" parameter to only search for new events. I do think something like this could work but I haven't had enough time to thoroughly test it and I'm not sure if it's the best possible solution. There may be an even easier way to determine the export status using existing ovirt Ansible modules but I have not found one yet. What are you thoughts on this method?

I'd also be interested to hear if you have any thoughts or opinions on ways to improve backup retention policy to make it more versatile. 

Thanks again for your feedback!

- Jayme



On Tue, Feb 18, 2020 at 8:15 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Feb 10, 2020 at 5:01 PM Jayme <jaymef@gmail.com> wrote:
I've been part of this mailing list for a while now and have received a lot of great advice and help on various subjects. I read the list daily and one thing I've noticed is that many users are curious about backup options for oVirt (myself included). I wanted to share with the community a solution I've come up with to easily backup multiple running oVirt VMs to OVA format using some basic Ansible playbooks. I've put together a blog post detailing the process which also includes links to a Github repo containing the playbooks here: https://blog.silverorange.com/backing-up-ovirt-vms-with-ansible-4c2fca8b3b43

Any feedback, suggestions or questions are welcome. I hope this information is helpful.

Thanks!

- Jayme

 
Hi Jayme,
sorry in advance for the long mail, where I try to give details; I don't know your Ansible experience.
A very nice and clean article indeed, with useful details (apart from text not justified: I prefer it but YMMV) and pretty fair with vProtect work and also pros and cons of their solution.
I met Pawel Maczka from vProtect during oVirt Summit last year and I was able to appreciate his kindness and skill and efforts in integrating with oVirt/RHV.

That said, I have some suggestions for you. In the next days I could work on a similar need for a customer, so it will be nice to share efforts and hopefully results... ;-)
This week I have not much time but if you can elaborate and test what below, we can share.

1) engine parameters
you could use ansible vault to encrypt credential files, to have better security and so you can disclose the playbook files without having to care abut sensitive information
In my case I put username, password, ovirt mgr fqdn, ovirt ca file all in a file and then encrypt it (and also engine database ones, see below).
Then I create a securely protected vault file named "vault_file" where I store the vault password and then I recall the playbook with:

ansible-playbook  --vault-password-file=vault_file backup_ovirt_vms.yml

Alternatively you are prompted for the vault password each time you run the playbook

2) The best practice of using the oVirt SSO token in Ansible is to use a block of kind:

  tasks:

    - name: Ansible block to export as OVA
      block:

        - name: Obtain SSO token using username/password credentials
          ovirt_auth:
            url: https://{{ url_name }}/ovirt-engine/api
            username: "{{ ovirt_username }}"
            password: "{{ ovirt_password }}"
            ca_file: "{{ ovirt_ca }}"

        - name: "Backup VMs"
          include_tasks: export_vm.yml
          loop: "{{ vms }}"

      always:

        - name: Revoke SSO token
          ovirt_auth:
            state: absent
            ovirt_auth: "{{ ovirt_auth }}"

So that anyway, thanks to the "always" section, you are sure to revoke the token

3) To manage timing of export to ova that fires up and suddenly completes from an ansible job point of view.
Possibly overkill, I don't know if ovirt_job module in any way could do the same, but I try to solve using the engine db

Please note that engine db credentials are on engine inside file:
/etc/ovirt-engine/engine.conf.d/10-setup-database.conf

and by default you can connect remotely to the database with engine user and with password being encrypted over the network thanks to pg_hba.conf in directory
/var/opt/rh/rh-postgresql10/lib/pgsql/data/pg_hba.conf

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    engine          engine          0.0.0.0/0               md5

There are two tables involved
"job_subject_entity" table that you query and where entity_id should be the id of the VM you are exportng and from this table you get the related job_id
"job" table where you query for the job_id matching from what you got from previous query (eventually to refine if you have more concurrent jobs running against your VM and you have to filter what interests to you, eg action_type= ExportVmToOva ... TBD)

Eg:
export running

engine=> \x
Expanded display is on.

engine=> select * from job_subject_entity where entity_id='442a1321-e366-4ea2-81bc-cad6e860a517';
-[ RECORD 1 ]-------------------------------------
job_id      | 1d4797f3-b1f9-4c19-8c8d-fb8c019399b1
entity_id   | 442a1321-e366-4ea2-81bc-cad6e860a517
entity_type | VM

engine=> select * from job where job_id='1d4797f3-b1f9-4c19-8c8d-fb8c019399b1';
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------
job_id                | 1d4797f3-b1f9-4c19-8c8d-fb8c019399b1
action_type           | ExportVmToOva
description           | Exporting VM c8 as an OVA to /rhev/data-center/mnt/10.4.192.69:_export_ovirt/dump/c8.ova on Host ov301
status                | STARTED
owner_id              | 58823863-00d4-0257-0094-0000000002f3
visible               | t
start_time            | 2020-02-18 12:00:16.629+01
end_time              |
last_update_time      | 2020-02-18 12:00:33.905+01
correlation_id        | 2bc48fc7-4af7-4287-a090-7245bac6f84b
is_external           | f
is_auto_cleared       | t
engine_session_seq_id | 1091

The status field is STARTED and the end_time has no value

When completed:

engine=> select * from job where job_id='1d4797f3-b1f9-4c19-8c8d-fb8c019399b1';
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------
job_id                | 1d4797f3-b1f9-4c19-8c8d-fb8c019399b1
action_type           | ExportVmToOva
description           | Exporting VM c8 as an OVA to /rhev/data-center/mnt/10.4.192.69:_export_ovirt/dump/c8.ova on Host ov301
status                | FINISHED
owner_id              | 58823863-00d4-0257-0094-0000000002f3
visible               | t
start_time            | 2020-02-18 12:00:16.629+01
end_time              | 2020-02-18 12:11:04.795+01
last_update_time      | 2020-02-18 12:11:04.795+01
correlation_id        | 2bc48fc7-4af7-4287-a090-7245bac6f84b
is_external           | f
is_auto_cleared       | t
engine_session_seq_id | 1091

engine=>

The status field is FINISHED (or FAILED if something went wrong, to be tested, verified) and end_time has the completion (or failure perhaps, TBV) timestamp
Please note that the records remains into the two tables for some minutes (it seems 15) after FINISHED status has been reached.

From ansible point of view you use postgresql_query module. The export_vm.yml becomes sort of this where you register the ovirt_vm task result and then you monitor via postgresql queries

- name: "Export VM to OVA"
  ovirt_vm:
    auth: "{{ ovirt_auth }}"
    name: "{{ item }}"
    state: exported
    cluster: "{{ cluster }}"
    export_ova:
        host: "{{ host }}"
        filename: "{{ item }}.{{ file_ext }}"
        directory: "{{ in_progress_path }}"
  register: export_result

- name: Debug export vm as ova result
  debug:
    var: export_result

- name: Monitor export job status
  include_tasks: query_task.yml
  vars:
    entity_id: "{{ export_result.id }}"


Example of export_result registered variable is

TASK [Debug export vm as ova result] ********************************************************************
ok: [localhost] => {
    "export_result": {
        "changed": true,
        "diff": {
            "after": {},
            "before": {}
        },
        "failed": false,
        "id": "442a1321-e366-4ea2-81bc-cad6e860a517",
        "vm": {
. . .

        }
    }
}

You give the export_result.id as the value of entity_id variable to the included task

The query_task.yml task file contains:

---

- name: query job_subject_entity table
  postgresql_query:
    login_host: "{{ dbserver }}"        
    db: "{{ db }}"
    login_user: "{{ username }}"
    login_password: "{{ password }}"
    query: "select * from job_subject_entity where entity_id='{{ entity_id }}'"
  register: job_id

- name: debug query result
  debug:
    var: job_id

- name: query job table
  postgresql_query:
    login_host: "{{ dbserver }}"        
    db: "{{ db }}"
    login_user: "{{ username }}"
    login_password: "{{ password }}"
    query: "select * from job where job_id='{{ job_id.query_result[0].job_id }}'"
  register: job_status
  until: job_status.query_result[0].status == "FINISHED"
  retries: 15
  delay: 60
   
- name: debug query result
  debug:
    var: job_status

...

In my case I set 15 retries with delay of 60 seconds between one and another
Here below a complete run for one vm (for which the export has an elapsed of about 10 minutes).

After the line

TASK [query job table] *****************************************************************************

you will get the prompt until the export has finished and filled with the lines of type

FAILED - RETRYING: query job table (15 retries left).

FAILED - RETRYING: query job table (14 retries left).

ecc

depending on delay and retries set.

$ ansible-playbook --vault-password-file=vault_pw  backup_ovirt_vms.yml

PLAY [localhost] ***********************************************************************************

TASK [Gathering Facts] *****************************************************************************
ok: [localhost]

TASK [Obtain SSO token using username/password credentials] ****************************************
ok: [localhost]

TASK [Backup VMs] **********************************************************************************
included: /home/g.cecchi/ovirt/export_vm.yml for localhost

TASK [Export VM to OVA] ****************************************************************************
changed: [localhost]

TASK [Debug export vm as ova result] ***************************************************************
ok: [localhost] => {
    "export_result.id": "442a1321-e366-4ea2-81bc-cad6e860a517"
}

TASK [Monitor export job status] *******************************************************************
included: /home/g.cecchi/ovirt/query_task.yml for localhost

TASK [query job_subject_entity table] **************************************************************
ok: [localhost]

TASK [debug query result] **************************************************************************
ok: [localhost] => {
    "job_id": {
        "changed": false,
        "failed": false,
        "query": "select * from job_subject_entity where entity_id='442a1321-e366-4ea2-81bc-cad6e860a517'",
        "query_result": [
            {
                "entity_id": "442a1321-e366-4ea2-81bc-cad6e860a517",
                "entity_type": "VM",
                "job_id": "8064059e-4b74-48a2-83be-a7dfd5067172"
            }
        ],
        "rowcount": 1,
        "statusmessage": "SELECT 1"
    }
}

TASK [query job table] *****************************************************************************
FAILED - RETRYING: query job table (15 retries left).
FAILED - RETRYING: query job table (14 retries left).
FAILED - RETRYING: query job table (13 retries left).
FAILED - RETRYING: query job table (12 retries left).
FAILED - RETRYING: query job table (11 retries left).
FAILED - RETRYING: query job table (10 retries left).
FAILED - RETRYING: query job table (9 retries left).
FAILED - RETRYING: query job table (8 retries left).
FAILED - RETRYING: query job table (7 retries left).
FAILED - RETRYING: query job table (6 retries left).
FAILED - RETRYING: query job table (5 retries left).
ok: [localhost]

TASK [debug query result] **************************************************************************
ok: [localhost] => {
    "job_status": {
        "attempts": 12,
        "changed": false,
        "failed": false,
        "query": "select * from job where job_id='8064059e-4b74-48a2-83be-a7dfd5067172'",
        "query_result": [
            {
                "action_type": "ExportVmToOva",
                "correlation_id": "ed3d7454-c52c-46ce-bfbe-97ee8e1e885e",
                "description": "Exporting VM c8 as an OVA to /rhev/data-center/mnt/10.4.192.69:_export_ovirt/dump/c8.ova on Host ov301",
                "end_time": "2020-02-18T13:00:19.314000+01:00",
                "engine_session_seq_id": 1093,
                "is_auto_cleared": true,
                "is_external": false,
                "job_id": "8064059e-4b74-48a2-83be-a7dfd5067172",
                "last_update_time": "2020-02-18T13:00:19.314000+01:00",
                "owner_id": "58823863-00d4-0257-0094-0000000002f3",
                "start_time": "2020-02-18T12:49:36.862000+01:00",
                "status": "FINISHED",
                "visible": true
            }
        ],
        "rowcount": 1,
        "statusmessage": "SELECT 1"
    }
}

TASK [Revoke SSO token] ****************************************************************************
ok: [localhost]

PLAY RECAP *****************************************************************************************
localhost                  : ok=11   changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  

On host for these preliminary tests I'm using export domain, even if deprecated, so that I can use whatever host I want in the cluster and I have

[root@ov301 dump]# ll
total 35650236
-rw-------. 1 root root 21478390272 Feb 18 13:00 c8.ova

I think there is a bug open about root permissions of this file here. Possibly they depend on way how export domain is mounted?
https://bugzilla.redhat.com/show_bug.cgi?id=1645229

Improvements to intercept failures and in case the automatically created snapshot has not been deleted to manage it, consolidating...

Let me know what do you think about it.
HIH,
Gianluca