On Wed, Apr 15, 2020 at 4:52 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
[snip]
Snippet for hosted engine hosts shutdown:

        - name: Shutdown of HE hosts
          command: >-
            ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
            -i /etc/pki/ovirt-engine/keys/engine_id_rsa -p {{ item.ssh.port }}
            -t root@{{ item.address }} '{{ he_shutdown_cmd }}'
          async: 1000
          poll: 0
          with_items:
            - "{{ he_hosts }}"

where the he_shutdown_cmd var is defined as:

        he_shutdown_cmd: >-
          while hosted-engine --vm-status | grep "\"vm\": \"up\"" >/dev/null;
          do sleep 5;
          done;
          sanlock client shutdown -f 1;
          shutdown -h now

Snippet for the Engine VM shutdown

        - name: Shutdown engine host/VM
          command: shutdown -h now
          async: 1000
          poll: 0

[snip]

Could it be useful to insert the /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh command, as suggested by Strahil, after the sanlock one, in case of GlusterFS domain?

Also one further note:
inside the role it is used the ovirt_host_facts module. I get this when I use it and debug its content:
            {
                "msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts",
                "version": "2.13"
            }

So perhaps it should be considered to change and use ovirt_host_info instead? Any plan?

Thanks for reading.

Gianluca



Hello,
I would like to keep on with this to have a better experience.
Environment is physical 4.3.10 single host HCI that shows the same problems as above.
So I modified the role file adding after sanlock shutdown the gluster stop script

[root@ovengine tasks]# pwd
/root/roles/ovirt.shutdown_env/tasks

[root@ovengine tasks]# diff main.yml main.yml.orig
79d78
<           /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh
[root@ovengine tasks]#

Now the poweroff completes, even if get these errors about stopping swap and gluster bricks filesystems:
https://drive.google.com/file/d/1oh0sNC3ta5qP0KAcibTdDc5N_lpil8pS/view?usp=sharing 

When I power on again the server in a second time, the environment starts ok in global maintenance and when I exit it, engine and gluster volumes start ok.
Even if there is a 2-3 minutes delay (I already opened a thread about this) from the moment when you see the storage domains up in the web admin gui and the moment when they are truely up (file systems  /rhev/data-center/mnt/glusterSD/... mounted). So if in the mean time you try to start a VM you get an error because disks not found...

Comments?
Gianluca