Hello,
TL;DR : engine stops talking with rebooted host.
[oVirt 4.2.3.5-1.el7.centos]
- From the web gui, upgrading a host, allowing the reboot checkbox checked
- upgrade is OK (/var/log/yum.log is showing successful updates + the
Ansible host deploy log is also OK)
- reboot is OK (clean, SSH OK...)
- the host eventually appears as "Install failed"
- the engine.log is telling :
2018-06-19 10:02:24,896+02 ERROR
[org.ovirt.engine.core.bll.SshHostRebootCommand]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7) [6e32b3ac] SSH
reboot command failed on host 'serv-hv-prds06': SSH session timeout
host 'root@ serv-hv-prds06' Stdout: Stderr: 2018-06-19
10:02:25,028+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7) [6e32b3ac]
EVENT_ID: SYSTEM_FAILED_SSH_HOST_RESTART(198), A restart usin g SSH
initiated by the engine to Host serv-hv-prds06 has failed. 2018-06-19
10:02:25,185+02 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7) [6e32b3ac]
START, SetVdsStatusVDSCommand(HostName = serv-hv-prds06,
SetVdsStatusVDSCom
mandParameters:{hostId='9c1566a4-8432-4de6-b30d-fd3b8e5fafca',
status='InstallFailed', nonOperationalReason='NONE',
stopSpmFailureLogged='false', maintenanceReason='null'}), log id:
833f9bd 2018-06-19 10:02:25,191+02 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7) [6e32b3ac]
FINISH, SetVdsStatusVDSCommand, log id: 833f9bd 2018-06-19
10:02:25,191+02 ERROR
[org.ovirt.engine.core.bll.hostdeploy.UpgradeHostInternalCommand]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7) [6e32b3ac]
Engine failed to restart via ssh host 'serv-hv-prds06' ('9c1566a4-
8432-4de6-b30d-fd3b8e5fafca') after upgrade 2018-06-19
10:02:25,256+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-commandCoordinator-Thread-7)
[8b7c6e7d-1a22-407c-818b-849e67b94051] EVENT_ID:
HOST_UPGRADE_FAILED(841 ), Failed to upgrade Host serv-hv-prds06
(User: necarnot@sdis.isere.fr(a)SDIS38-authz). 2018-06-19
10:02:30,755+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-69)
[8b7c6e7d-1a22-407c-818b-849e67b94051] EVENT_ID:
HOST_UPGRADE_FAILED(841), Failed to upgrade Host serv-hv-prds06
(User: necarnot@sdis.isere.fr(a)SDIS38-authz).
- Manually activating the host puts it back on track without issue
The usual SSH communications between the engine and the host are usually
very sound (VM migrations, maintenance...).
On this oVirt DC, I reproduced this issue twice on 2 different hosts.
In this engine log above, you see that I'm using my account to manage
this engine, as I 'm doing for years with no issue.
I'll try the exact same path with admin@internal to see what could
change, but I don't see the link.
What other logs could I give you to debug this?
Regards,
--
Nicolas ECARNOT