Unable to reinstall 2nd and 3rd Host after restore HE backup onto 1st host

Hi there, We are running ovirt 4.5.4 on 3 RHEL 8.7 host with a on self hosted engine again using RHEL 8.7. We are making some changes to our company network so we followed these instructions here to move the ovirt engine onto a new storage with different IP address. https://access.redhat.com/solutions/6529691 We used a host to redeploy the hosted engine from a backup and we managed to get the hosted engine running on the new storage with the IP address. We then tried to reinstall the 2 other hosts but the reinstallation failed with the message: Host ovirt2.ad.tintolav.com installation failed. Task Restart services failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20230323083407-ovirt2.ad.tintolav.com-d950815b-f1b9-4dcf-b609-3ff1866a6b70.log In the ovirt-engie host deploy log we see [ "Traceback (most recent call last):", " File \"/usr/bin/vdsm-tool\", line 18, in <module>", " import vdsm.tool", "ModuleNotFoundError: No module named 'vdsm'" ] In the vdsm log on the host there are no errors except for a couple of warnings 2023-03-23 07:58:22,335+0100 WARN (periodic/1) [throttled] MOM not available. Error: [Errno 2] No such file or directory (throttledlog:87) 2023-03-23 07:58:22,336+0100 WARN (periodic/1) [throttled] MOM not available, KSM stats will be missing. Error: (throttledlog:87) We have tried a completely new install for ovirt2, but we have finished back at the same point with the same error. [root@ovirt2 ~]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: disabled) Active: inactive (dead) since Thu 2023-03-23 08:35:32 CET; 56min ago Process: 22942 ExecStart=/usr/libexec/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/libexec/vdsm/vdsmd (code=exited, status=0/SUCCESS) Process: 22876 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 22942 (code=exited, status=0/SUCCESS) Mar 23 07:58:20 ovirt2.ad.tintolav.com vdsmd_init_common.sh[22876]: vdsm: Running test_space Mar 23 07:58:20 ovirt2.ad.tintolav.com vdsmd_init_common.sh[22876]: vdsm: Running test_lo Mar 23 07:58:20 ovirt2.ad.tintolav.com systemd[1]: Started Virtual Desktop Server Manager. Mar 23 07:58:22 ovirt2.ad.tintolav.com vdsm[22942]: WARN MOM not available. Error: [Errno 2] No such file or directory Mar 23 07:58:22 ovirt2.ad.tintolav.com vdsm[22942]: WARN MOM not available, KSM stats will be missing. Error: Mar 23 08:35:32 ovirt2.ad.tintolav.com systemd[1]: Stopping Virtual Desktop Server Manager... Mar 23 08:35:32 ovirt2.ad.tintolav.com systemd[1]: vdsmd.service: Succeeded. Mar 23 08:35:32 ovirt2.ad.tintolav.com systemd[1]: Stopped Virtual Desktop Server Manager. Mar 23 08:35:33 ovirt2.ad.tintolav.com systemd[1]: Dependency failed for Virtual Desktop Server Manager. Mar 23 08:35:33 ovirt2.ad.tintolav.com systemd[1]: vdsmd.service: Job vdsmd.service/start failed with result 'dependency'. Mar 23 08:35:33 ovirt2.ad.tintolav.com systemd[1]: vdsmd.service: Job vdsmd.service/start failed with result 'dependency'. [root@ovirt2 ~]# systemctl start vdsmd A dependency job for vdsmd.service failed. See 'journalctl -xe' for details. [root@ovirt2 ~]# journalctl -xe Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root. -- Subject: Unit supervdsmd.service has finished shutting down -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Unit supervdsmd.service has finished shutting down. Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: Started Auxiliary vdsm service for running helper functions as root. -- Subject: Unit supervdsmd.service has finished start-up -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Unit supervdsmd.service has finished starting up. -- -- The start-up result is done. Mar 23 09:34:35 ovirt2.ad.tintolav.com daemonAdapter[28677]: Traceback (most recent call last): Mar 23 09:34:35 ovirt2.ad.tintolav.com daemonAdapter[28677]: File "/usr/libexec/vdsm/daemonAdapter", line 16, in <module> Mar 23 09:34:35 ovirt2.ad.tintolav.com daemonAdapter[28677]: from vdsm.config import config Mar 23 09:34:35 ovirt2.ad.tintolav.com daemonAdapter[28677]: ModuleNotFoundError: No module named 'vdsm' Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: supervdsmd.service: Main process exited, code=exited, status=1/FAILURE Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: supervdsmd.service: Failed with result 'exit-code'. -- Subject: Unit failed -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit supervdsmd.service has entered the 'failed' state with result 'exit-code'. Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: supervdsmd.service: Service RestartSec=100ms expired, scheduling restart. Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: supervdsmd.service: Scheduled restart job, restart counter is at 5. -- Subject: Automatic restarting of a unit has been scheduled -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Automatic restarting of the unit supervdsmd.service has been scheduled, as the result for -- the configured Restart= setting for the unit. Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root. -- Subject: Unit supervdsmd.service has finished shutting down -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Unit supervdsmd.service has finished shutting down. Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: supervdsmd.service: Start request repeated too quickly. Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: supervdsmd.service: Failed with result 'exit-code'. -- Subject: Unit failed -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit supervdsmd.service has entered the 'failed' state with result 'exit-code'. Mar 23 09:34:35 ovirt2.ad.tintolav.com systemd[1]: Failed to start Auxiliary vdsm service for running helper functions as root. -- Subject: Unit supervdsmd.service has failed -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Unit supervdsmd.service has failed. -- -- The result is failed. It gives the same error on the 3rd host. Is this a python dependency issue? We are currently using python3.9. Does anyone have any ideas how we get host 2 and 3 up and running again in the cluster? Thanks James

Well after many hours of debugging the issue was not with hosts but with the ovirt-engine. We didn't think it was the engine because we had successfully redeployed the engine on the first host. However it was the engine and it was due to the fact it was running the latest version of ansible 2.14.2 which uses python3.11. Unfortunately python3.11 does not have all the module yet compiled for this version so the adding of the host failed due to the netaddr module being missing. So we put python3.6 as the default version on the system because vdsm and other service need this version. We then copied from python3.9 the module netaddr to the python3.11 module folder. At this point we could reinstall the hosts. Somebody else had exactly the same problem at the same time and you can read the issue report here. https://github.com/oVirt/ovirt-ansible-collection/issues/695
participants (1)
-
James Wadsworth