Well, answering to myself for more informations.
Thinking that the network was part of the problem, I tried to stop gluster volumes, stop gluster on host, and stop bond0.
So, the host now had just em1 with one IP.
And... The winner is... Yes : the install passed the "[Get local VM IP]" and continued !!
I hit ctrl-c, restart the bond0, restart deploy : it crashed. So it seems that more than one network is the problem. But ! How do I install engine on gluster on a separate - bonding - jumbo network in this case ???
Can you reproduce this on your side ?
Frank
Le Lundi, Juin 25, 2018 16:50 CEST, "fsoyer" <fsoyer@systea.fr> a écrit:
Hi staff,
Installing a fresh ovirt - CentOS 7.5.1804 up to date, ovirt version :
# rpm -qa | grep ovirt
ovirt-hosted-engine-ha-2.2.11-1.el7.centos.noarch
ovirt-imageio-common-1.3.1.2-0.el7.centos.noarch
ovirt-host-dependencies-4.2.2-2.el7.centos.x86_64
ovirt-vmconsole-1.0.5-4.el7.centos.noarch
ovirt-provider-ovn-driver-1.2.10-1.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.20-1.el7.centos.noarch
ovirt-engine-appliance-4.2-20180504.1.el7.centos.noarch
python-ovirt-engine-sdk4-4.2.6-2.el7.centos.x86_64
ovirt-host-deploy-1.7.3-1.el7.centos.noarch
ovirt-release42-4.2.3.1-1.el7.noarch
ovirt-vmconsole-host-1.0.5-4.el7.centos.noarch
cockpit-ovirt-dashboard-0.11.24-1.el7.centos.noarch
ovirt-setup-lib-1.1.4-1.el7.centos.noarch
ovirt-imageio-daemon-1.3.1.2-0.el7.centos.noarch
ovirt-host-4.2.2-2.el7.centos.x86_64
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch
ON PHYSICAL SERVERS (not on VMware, why should I be ?? ;) I got exactly the same error :
[ INFO ] TASK [Get local VM IP]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:69:3a:c6 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.073313", "end": "2018-06-25 16:11:36.025277", "rc": 0, "start": "2018-06-25 16:11:35.951964", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
[ INFO ] TASK [include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Remove local vm dir]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO ] Stage: Clean up
I have 4 NIC :
em1 10.0.0.230/8 is for ovirmgmt, it have the gateway
em2 10.0.0.229/8 is for a vmnetwork
em3+em4 in bond0 192.168.0.30 are for gluster with jumbo frames, volumes (ENGINE, ISO,EXPORT,DATA) are up and operationals.
I tried to stop em2 (ONBOOT=No and restart network), so the network is actually :
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether e0:db:55:15:eb:70 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.230/8 brd 10.255.255.255 scope global em1
valid_lft forever preferred_lft forever
inet6 fe80::e2db:55ff:fe15:eb70/64 scope link
valid_lft forever preferred_lft forever
3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether e0:db:55:15:eb:71 brd ff:ff:ff:ff:ff:ff
4: em3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP group default qlen 1000
link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
5: em4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP group default qlen 1000
link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.30/24 brd 192.168.0.255 scope global bond0
valid_lft forever preferred_lft forever
inet6 fe80::e2db:55ff:fe15:eb72/64 scope link
valid_lft forever preferred_lft forever
# ip r
default via 10.0.1.254 dev em1
10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.230
169.254.0.0/16 dev em1 scope link metric 1002
169.254.0.0/16 dev bond0 scope link metric 1006
192.168.0.0/24 dev bond0 proto kernel scope link src 192.168.0.30
but same issue, after "/usr/sbin/ovirt-hosted-engine-cleanup" and restarting the deployment.NetworkManager was stopped and disabled at the node install, and it is still stopped.
After the error, the network shows this after device 6 (bond0) :
7: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:38:e0:5a brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
8: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:38:e0:5a brd ff:ff:ff:ff:ff:ff
11: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 1000
link/ether fe:16:3e:69:3a:c6 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc16:3eff:fe69:3ac6/64 scope link
valid_lft forever preferred_lft forever
I do not see ovirmgmt... And I don't know if I can access the engine vm as I have not its IP :(
I tried to ping addresses after 192.168.122.1, but no one are accessible so I stopped at 122.10. The VM seems up (kvm process), qemu-kvm process taking 150% of cpu in "top"...
I pasted the log here : https://pastebin.com/Ebzh1uEh
PLEASE ! This issue seems to be reccurent since the beginning of 2018 (see messages here on list !
Jamie Lawrence in February, suporte@logicworks.pt in april, shamilkpm@gmail.com and Yaniv Kaul in May,
florentl on june 01...). Can anyone give us a way to solve this ?
--
Cordialement,
Frank Soyer
Le Lundi, Juin 04, 2018 16:07 CEST, Simone Tiraboschi <stirabos@redhat.com> a écrit: