Hosted Engine Deployment timeout waiting for VM
Hello users, I am currently trying to deploy the self-hosted engine via the web interface but it seems stuck at the task "Wait for the local VM" (https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/hosted_e...). I am unsure what to look at for getting more info as I haven't worked a lot with Ansible before. Do you have any idea how to debug? The temporary IP is added to /etc/hosts and I can also login to the VM via SSH: [root@server-005 ~]# cat /etc/hosts 192.168.1.97 ovirt-engine-test.admin.int.rabe.ch # temporary entry added by hosted-engine-setup for the bootstrap VM 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.128.16.5 server-005.admin.int.rabe.ch 10.128.16.6 server-006.admin.int.rabe.ch 10.128.16.7 server-007.admin.int.rabe.ch #10.128.32.2 ovirt-engine-test.admin.int.rabe.ch 10.132.16.5 server-005.storage.int.rabe.ch 10.132.16.6 server-006.storage.int.rabe.ch 10.132.16.7 server-007.storage.int.rabe.ch [root@server-005 ~]# ssh ovirt-engine-test.admin.int.rabe.ch root@ovirt-engine-test.admin.int.rabe.ch's password: Web console: https://ovirt-engine-test.admin.int.rabe.ch:9090/ or https://192.168.1.97:9090/ Last login: Mon Apr 18 11:33:53 2022 from 192.168.1.1 [root@ovirt-engine-test ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:16:3e:58:7a:a3 brd ff:ff:ff:ff:ff:ff inet 192.168.1.97/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0 valid_lft 2313sec preferred_lft 2313sec inet6 fe80::216:3eff:fe58:7aa3/64 scope link valid_lft forever preferred_lft forever Thank you for any tips for debugging. Jonas
Is it timing out ? Best Regards,Strahil Nikolov On Mon, Apr 18, 2022 at 12:46, Jonas Liechti<jonas@rabe.ch> wrote: Hello users, I am currently trying to deploy the self-hosted engine via the web interface but it seems stuck at the task "Wait for the local VM" (https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/hosted_e...). I am unsure what to look at for getting more info as I haven't worked a lot with Ansible before. Do you have any idea how to debug? The temporary IP is added to /etc/hosts and I can also login to the VM via SSH: [root@server-005 ~]# cat /etc/hosts 192.168.1.97 ovirt-engine-test.admin.int.rabe.ch # temporary entry added by hosted-engine-setup for the bootstrap VM 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.128.16.5 server-005.admin.int.rabe.ch 10.128.16.6 server-006.admin.int.rabe.ch 10.128.16.7 server-007.admin.int.rabe.ch #10.128.32.2 ovirt-engine-test.admin.int.rabe.ch 10.132.16.5 server-005.storage.int.rabe.ch 10.132.16.6 server-006.storage.int.rabe.ch 10.132.16.7 server-007.storage.int.rabe.ch [root@server-005 ~]# ssh ovirt-engine-test.admin.int.rabe.ch root@ovirt-engine-test.admin.int.rabe.ch's password: Web console: https://ovirt-engine-test.admin.int.rabe.ch:9090/ or https://192.168.1.97:9090/ Last login: Mon Apr 18 11:33:53 2022 from 192.168.1.1 [root@ovirt-engine-test ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:16:3e:58:7a:a3 brd ff:ff:ff:ff:ff:ff inet 192.168.1.97/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0 valid_lft 2313sec preferred_lft 2313sec inet6 fe80::216:3eff:fe58:7aa3/64 scope link valid_lft forever preferred_lft forever Thank you for any tips for debugging. Jonas _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NWZJN4AZFS3IMK...
Yes that seems to be the case, even though python also complains about too many open files. Here's the log: https://zerobin.net/?f68ce76deabf1fc3#5fpLHY+mbFd1qMEFh+d0a4qdA0odSs4ZAP9EEN... (/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20220418104538-gkujrm.log). Regards, Jonas On 4/18/22 19:02, Strahil Nikolov via Users wrote:
Is it timing out ?
Best Regards, Strahil Nikolov
On Mon, Apr 18, 2022 at 12:46, Jonas Liechti <jonas@rabe.ch> wrote: Hello users,
I am currently trying to deploy the self-hosted engine via the web interface but it seems stuck at the task "Wait for the local VM" (https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/hosted_e...).
I am unsure what to look at for getting more info as I haven't worked a lot with Ansible before. Do you have any idea how to debug?
The temporary IP is added to /etc/hosts and I can also login to the VM via SSH:
[root@server-005 ~]# cat /etc/hosts 192.168.1.97 ovirt-engine-test.admin.int.rabe.ch # temporary entry added by hosted-engine-setup for the bootstrap VM 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.128.16.5 server-005.admin.int.rabe.ch 10.128.16.6 server-006.admin.int.rabe.ch 10.128.16.7 server-007.admin.int.rabe.ch #10.128.32.2 ovirt-engine-test.admin.int.rabe.ch 10.132.16.5 server-005.storage.int.rabe.ch 10.132.16.6 server-006.storage.int.rabe.ch 10.132.16.7 server-007.storage.int.rabe.ch [root@server-005 ~]# ssh ovirt-engine-test.admin.int.rabe.ch root@ovirt-engine-test.admin.int.rabe.ch's password: Web console: https://ovirt-engine-test.admin.int.rabe.ch:9090/ or https://192.168.1.97:9090/
Last login: Mon Apr 18 11:33:53 2022 from 192.168.1.1 [root@ovirt-engine-test ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:16:3e:58:7a:a3 brd ff:ff:ff:ff:ff:ff inet 192.168.1.97/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0 valid_lft 2313sec preferred_lft 2313sec inet6 fe80::216:3eff:fe58:7aa3/64 scope link valid_lft forever preferred_lft forever
Thank you for any tips for debugging. Jonas _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NWZJN4AZFS3IMK...
Is your Hypervisor's ssh hardened ?Try without any modifications . Best Regards,Strahil Nikolov On Mon, Apr 18, 2022 at 22:33, Jonas<jonas@rabe.ch> wrote: _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5FRO3EK3YRK2OP...
Hi, can you please share which version of oVirt you're deploying? 4.4.10? 4.5.0? Il giorno lun 18 apr 2022 alle ore 11:45 Jonas Liechti <jonas@rabe.ch> ha scritto:
Hello users,
I am currently trying to deploy the self-hosted engine via the web interface but it seems stuck at the task "Wait for the local VM" ( https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/hosted_e...).
I am unsure what to look at for getting more info as I haven't worked a lot with Ansible before. Do you have any idea how to debug?
The temporary IP is added to /etc/hosts and I can also login to the VM via SSH:
[root@server-005 ~]# cat /etc/hosts 192.168.1.97 ovirt-engine-test.admin.int.rabe.ch # temporary entry added by hosted-engine-setup for the bootstrap VM 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.128.16.5 server-005.admin.int.rabe.ch 10.128.16.6 server-006.admin.int.rabe.ch 10.128.16.7 server-007.admin.int.rabe.ch #10.128.32.2 ovirt-engine-test.admin.int.rabe.ch 10.132.16.5 server-005.storage.int.rabe.ch 10.132.16.6 server-006.storage.int.rabe.ch 10.132.16.7 server-007.storage.int.rabe.ch [root@server-005 ~]# ssh ovirt-engine-test.admin.int.rabe.ch root@ovirt-engine-test.admin.int.rabe.ch's password: Web console: https://ovirt-engine-test.admin.int.rabe.ch:9090/ or https://192.168.1.97:9090/
Last login: Mon Apr 18 11:33:53 2022 from 192.168.1.1 [root@ovirt-engine-test ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:16:3e:58:7a:a3 brd ff:ff:ff:ff:ff:ff inet 192.168.1.97/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0 valid_lft 2313sec preferred_lft 2313sec inet6 fe80::216:3eff:fe58:7aa3/64 scope link valid_lft forever preferred_lft forever
Thank you for any tips for debugging. Jonas _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NWZJN4AZFS3IMK...
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
During Deploy of hosted-engine 4.5.7 under Almalinux 10.1 x86_64_v2: hosted-engine --deploy --4 --restore-from-file=260204-backup.tar.gz --config-append=260204-answers-deploy.conf above other issue there is this "freeze" under Wait for the local VM. root@ovirt-node4:/var/log/ovirt-hosted-engine-setup# rpm -qa | grep ovirt-hosted-engine-setup ovirt-hosted-engine-setup-2.7.3-0.0.master.20251020093656.gitb8855de.el10.noarch I patched a module: diff -c /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/03_engine_initial_tasks.yml{.orig,} *** /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/03_engine_initial_tasks.yml.orig 2026-01-06 01:00:00.000000000 +0100 --- /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/03_engine_initial_tasks.yml 2026-02-04 16:43:23.966315249 +0100 *************** *** 3,10 **** block: - name: Wait for the local VM ansible.builtin.wait_for_connection: ! delay: 5 ! timeout: 3600 - name: Add an entry for this host on /etc/hosts on the local VM ansible.builtin.lineinfile: dest: /etc/hosts --- 3,20 ---- block: - name: Wait for the local VM ansible.builtin.wait_for_connection: ! delay: 30 ! timeout: 600 ! - name: DEBUG - Test manual SSH connection ! ansible.builtin.shell: | ! ssh -v root@{{ hostvars[he_ansible_host_name]['local_vm_ip']['stdout_lines'][0] }} 'echo Connected' ! delegate_to: localhost ! ignore_errors: yes ! ! - name: DEBUG - Print connection variables ! ansible.builtin.debug: ! msg: "Target host: {{ hostvars[he_ansible_host_name]['local_vm_ip']['stdout_lines'][0] }} | FQDN: {{ he_fqdn }} Ansible connection: {{ ansible_connection | default('ssh') }}" ! - name: Add an entry for this host on /etc/hosts on the local VM ansible.builtin.lineinfile: dest: /etc/hosts And after timeout there is the answer (in my case) - while with 5 sec and 3600 seconds the module freezed -: [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Register the engine FQDN as a host] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Wait for the local VM] [ ERROR ] fatal: [localhost -> 192.168.222.5]: FAILED! => {"changed": false, "elapsed": 630, "msg": "timed out waiting for ping module test: to use the 'ssh' connection type with passwords or pkcs11_provider, you must install the sshpass program"} [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Sync on engine machine] [ ERROR ] fatal: [localhost]: FAILED! => {"msg": "to use the 'ssh' connection type with passwords or pkcs11_provider, you must install the sshpass program"} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ INFO ] Stage: Clean up [ INFO ] Cleaning temporary resources There was the lack of "sshpass" which you enable with: dnf install sshpass **** please add ovirt-setup-engine dependacy *****
These are my brief notes of the installation of ovirt engine 4.5.7 over 4.5.6 engine under el8. This is my environment: * Almalinux 10.1: x86_64_v2 * ovirt-engine-appliance-almalinux10-4.5.7-1.el10.x86_64 I upgraded node from el8 (centos) via el9 (centos) to el10 (Alma linux) because the el8 4.5.6 engine cannot connect to el10 directly. I found many issue moving from my old installation. After successful installation you have to put all the cluster in management mode (powering off all the VMs), change the compatibility level of the cluster (4.8) and reinstall all the nodes. (please take care that this is only a brief memo of my history) For the deployment of the hosted-engine this is what I did - please note that it isn't a step 2 step instruction, these are only my working notes that I found after many many iterations. ----------------------------------- install 4.5.7 dnf install sshpass hosted-engine --deploy --4 --restore-from-file=260204-backup.tar.gz --config-append=260204-answers-deploy.conf Please NOTE that you have to set the same as the original engine: eg. if you have keycloak disabled in the original you have to set disabled here or in the "--config-append" file. Please also notice NOT TO USE the same hosted-engine storage for the new engine! You have to set up a NEW domain. patch -p0 -d/ <<EOF *** /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/03_engine_initial_tasks.yml.orig 2026-01-06 01:00:00.000000000 +0100 --- /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/03_engine_initial_tasks.yml 2026-02-04 17:13:38.457406310 +0100 *************** *** 3,10 **** block: - name: Wait for the local VM ansible.builtin.wait_for_connection: ! delay: 5 ! timeout: 3600 - name: Add an entry for this host on /etc/hosts on the local VM ansible.builtin.lineinfile: dest: /etc/hosts --- 3,18 ---- block: - name: Wait for the local VM ansible.builtin.wait_for_connection: ! delay: 30 ! timeout: 600 ! - name: DEBUG - Test manual SSH connection ! ansible.builtin.debug: ! msg: "ssh -v root@{{ hostvars[he_ansible_host_name]['local_vm_ip']['stdout_lines'][0] }} 'echo Connected'" ! ! - name: DEBUG - Print connection variables ! ansible.builtin.debug: ! msg: "Target host: {{ hostvars[he_ansible_host_name]['local_vm_ip']['stdout_lines'][0] }} | FQDN: {{ he_fqdn }} Ansible connection: {{ ansible_connection | default('ssh') }}" ! - name: Add an entry for this host on /etc/hosts on the local VM ansible.builtin.lineinfile: dest: /etc/hosts EOF patch -p0 -d/ <<EOF *** /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/04_engine_final_tasks.yml.orig 2026-02-06 16:31:52.827175249 +0100 --- /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/04_engine_final_tasks.yml 2026-02-06 16:32:23.427533682 +0100 *************** *** 10,18 **** # After a restart the engine has a 5 minute grace time, # other actions like electing a new SPM host or reconstructing # the master storage domain could require more time ! - name: Wait for the engine to reach a stable condition ansible.builtin.wait_for: ! timeout: "600" when: he_restore_from_file is defined and he_restore_from_file - name: Configure LibgfApi support ansible.builtin.command: engine-config -s LibgfApiSupported=true --cver=4.2 --- 10,18 ---- # After a restart the engine has a 5 minute grace time, # other actions like electing a new SPM host or reconstructing # the master storage domain could require more time ! - name: Wait for the engine to reach a stable condition (600s too much) ansible.builtin.wait_for: ! timeout: 180 when: he_restore_from_file is defined and he_restore_from_file - name: Configure LibgfApi support ansible.builtin.command: engine-config -s LibgfApiSupported=true --cver=4.2 EOF . ~/.profile export http_proxy=http://proxy.dmz.ssis:3128 export https_proxy=http://proxy.dmz.ssis:3128 export ftp_proxy=http://proxy.dmz.ssis:3128 export no_proxy=.ovirt # connect to ovirt-engine@localhost echo "proxy=http://proxy.dmz.ssis:3128" | tee -a /etc/yum.conf while [ `ls -1 /var/log/ovirt-engine/setup | wc -l` -eq 0 ]; do echo -n "."; sleep 1; done su - postgres psql -c "ALTER DEFAULT PRIVILEGES FOR ROLE postgres IN SCHEMA public GRANT SELECT ON TABLES TO ovirt_engine_history_grafana;" exit link=/var/lib/grafana/plugins/performancecopilot-pcp-app while :; do if [ -L "$link" ] && ! [ -e "$link" ]; then echo "Rimuovo link interrotto" rm -f "$link" grafana cli plugins install performancecopilot-pcp-app break fi echo -n "." sleep 1 done patch -p0 -d/<<EOF *** /usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.py.orig 2026-02-04 20:53:23.672000000 +0100 --- /usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.py 2026-02-04 20:26:23.619000000 +0100 *************** *** 63,69 **** def _processTemplate(self, template, dir, mode=None): out = os.path.join( dir, ! re.sub('\.in$', '', os.path.basename(template)), ) with open(template, 'r', encoding='utf-8') as f: t = Template(f.read()) --- 63,69 ---- def _processTemplate(self, template, dir, mode=None): out = os.path.join( dir, ! re.sub(r'\.in$', '', os.path.basename(template)), ) with open(template, 'r', encoding='utf-8') as f: t = Template(f.read()) EOF chmod a+r /etc/pki/ovirt-engine/keys/engine.p12;chmod a+r /etc/pki/ovirt-engine/keys/jboss.p12 tail -F /var/log/ovirt-engine/setup/*.log # ansible: Make sure `ovirt-engine` service is running curl http://localhost/ovirt-engine/services/health tail -F /var/log/ovirt-engine/engine.log # ansible: Wait for the engine to reach a stable condition (600 seconds -too much-) After that, follow the ansible instructions. Finally you have the hosted-engine in the cluster full renewed
There are 2 postnotes: 1. after the engine finally run correctly, the ansible playbook want to copy it under the hestorage. the problem here is in the recipe: ``` /usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/create_target_vm/03_hosted_engine_final_tasks.yml: - name: Inject network configuration with guestfish ansible.builtin.command: >- guestfish -a {{ local_vm_disk_path }} --rw -i copy-in "{{ he_local_vm_dir }}/ifcfg-eth0" /etc/sysconfig/network-scripts {{ ":" }} selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /etc/sysconfig/network-scripts/ifcfg-eth0 force{{ ":" }}true environment: LIBGUESTFS_BACKEND: direct LANG: en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_ALL: en_US.UTF-8 changed_when: true ``` This part want to inject into the /etc/sysconfig/network-scripts the file ifcfg-eth0 but in almalinux 10.1 /etc/sysconfig/network-scripts is deprecated and so the command fail. You can solve issuing in the ovirt-engine before this playbook: `install -o root -d /etc/sysconfig/network-scripts` 2. we have a problem with dwh daemon. It doesn't start because Almalinux 10.1 lack of com/ongres/scram/common/stringprep/StringPreparation java class. See https://github.com/oVirt/ovirt-dwh/issues/78 for my solution
participants (5)
-
Diego Ercolani -
Jonas -
Jonas Liechti -
Sandro Bonazzola -
Strahil Nikolov