
On Tue, Dec 1, 2020 at 5:47 PM Roberto Nunin <robnunin@gmail.com> wrote:
We are following both oVirt upgrade guide [1] and RHV 4.4 upgrade guide [2].
aps-te62-mng.corporate.it ---> host resinatlled with oVirt Node 4.4.3 aps-te61-mng.corporate.it ---> host where previous ovirt-engine 4.3.10 VM was running when backup was taken.
hosted-engine --deploy --restore-from-file=<path to file> fails with following errors in ovirt-hosted-engine-setup:
2020-12-01 15:53:37,534+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, more info can be found in the engine logs, fix accordingly and re-deploy."} 2020-12-01 15:56:30,414+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} 2020-12-01 15:56:33,731+0100 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Closing up': Failed executing ansible-playbook 2020-12-01 15:57:08,663+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host itte1lv51-mng.comifar.it port 22: Connection timed out", "skip_reason": "Host localhost is unreachable", "unreachable": true} 2020-12-01 15:58:22,179+0100 ERROR otopi.plugins.gr_he_common.core.misc misc._terminate:167 Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
while within the HostedEngineLocal engine.log:
2020-12-01 15:52:42,161+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-96) [11f50ce0] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9'
Did you notice this? When you run deploy, it also prompts you whether to pause execution after adding the host to the engine. Please reply 'Yes', and then, when it waits for you to remove a lock file, connect to the engine web admin as instructed, fix the required networks (by going to the host and setting NICs to all required networks, then trying to activate it), and then, after it's 'Up' (green), remove the lock file on the host so that deploy tries to continue. We recently changed the behavior around this, see also: https://bugzilla.redhat.com/show_bug.cgi?id=1893385
2020-12-01 15:52:48,474+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedScheduledExecutorService-engineScheduledTh readPool-Thread-12) [41688fc7] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9' 2020-12-01 15:52:53,734+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-6) [5fc7257] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9' 2020-12-01 15:52:54,567+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-13) [] Rerun VM 'f9249e06-237e-412c-91e9-7b0fa0b6ec2a'. Called from VDS 'aps-te62-mng.corprorate.it' 2020-12-01 15:52:54,676+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-361) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: external-HostedEngineLocal, Source: aps-te62-mng.corporate.it, Destination: aps-te61-mng.corporate.it).
Why is the playbook trying to migrate HostedEngineLocal from reinstalled 4.4.3 oVirt node to an existing one that is still running oVirt Node 4.3.x ?
Not sure - perhaps because the new host is non-operational. I agree this is probably not very useful behavior - perhaps you want to open a bug about this, and attach all relevant logs.
How can we manage this issue and proceed with the upgrade ?
See above.
[1] https://www.ovirt.org/documentation/upgrade_guide/#SHE_Upgrading_from_4-3 [2] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/htm...
Thanks is advance for support. Best regards
Good luck and best regards, -- Didi