Failed upgrade from SHE 4.3.10 to 4.4.3 - Host set to Non-Operational - missing networks

We are following both oVirt upgrade guide [1] and RHV 4.4 upgrade guide [2]. aps-te62-mng.corporate.it ---> host resinatlled with oVirt Node 4.4.3 aps-te61-mng.corporate.it ---> host where previous ovirt-engine 4.3.10 VM was running when backup was taken. hosted-engine --deploy --restore-from-file=<path to file> fails with following errors in ovirt-hosted-engine-setup: 2020-12-01 15:53:37,534+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, more info can be found in the engine logs, fix accordingly and re-deploy."} 2020-12-01 15:56:30,414+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} 2020-12-01 15:56:33,731+0100 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Closing up': Failed executing ansible-playbook 2020-12-01 15:57:08,663+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host itte1lv51-mng.comifar.it port 22: Connection timed out", "skip_reason": "Host localhost is unreachable", "unreachable": true} 2020-12-01 15:58:22,179+0100 ERROR otopi.plugins.gr_he_common.core.misc misc._terminate:167 Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch. while within the HostedEngineLocal engine.log: 2020-12-01 15:52:42,161+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-96) [11f50ce0] Host ' aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9' 2020-12-01 15:52:48,474+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedScheduledExecutorService-engineScheduledTh readPool-Thread-12) [41688fc7] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9' 2020-12-01 15:52:53,734+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-6) [5fc7257] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9' 2020-12-01 15:52:54,567+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-13) [] Rerun VM 'f9249e06-237e-412c-91e9-7b0fa0b6ec2a'. Called from VDS ' aps-te62-mng.corprorate.it' 2020-12-01 15:52:54,676+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-361) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: external-HostedEngineLocal, Source: aps-te62-mng.corporate.it, Destination: aps-te61-mng.corporate.it). Why is the playbook trying to migrate HostedEngineLocal from reinstalled 4.4.3 oVirt node to an existing one that is still running oVirt Node 4.3.x ? How can we manage this issue and proceed with the upgrade ? [1] https://www.ovirt.org/documentation/upgrade_guide/#SHE_Upgrading_from_4-3 [2] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/htm... Thanks is advance for support. Best regards -- Roberto Nunin

Try this command on the host: vdsm-tool remove-config, vdsm-tool configure --force; If it will not work, you can stop your VMs with virsh, install a "clean" setup of the hosted engine and import your storage with all VMs. I think if it possible, installing a new HE and manually set some parameters, IMHO, a "little bit" faster, that trying to restore HE with different major releases.

On Tue, Dec 1, 2020 at 5:47 PM Roberto Nunin <robnunin@gmail.com> wrote:
We are following both oVirt upgrade guide [1] and RHV 4.4 upgrade guide [2].
aps-te62-mng.corporate.it ---> host resinatlled with oVirt Node 4.4.3 aps-te61-mng.corporate.it ---> host where previous ovirt-engine 4.3.10 VM was running when backup was taken.
hosted-engine --deploy --restore-from-file=<path to file> fails with following errors in ovirt-hosted-engine-setup:
2020-12-01 15:53:37,534+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, more info can be found in the engine logs, fix accordingly and re-deploy."} 2020-12-01 15:56:30,414+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} 2020-12-01 15:56:33,731+0100 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Closing up': Failed executing ansible-playbook 2020-12-01 15:57:08,663+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host itte1lv51-mng.comifar.it port 22: Connection timed out", "skip_reason": "Host localhost is unreachable", "unreachable": true} 2020-12-01 15:58:22,179+0100 ERROR otopi.plugins.gr_he_common.core.misc misc._terminate:167 Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
while within the HostedEngineLocal engine.log:
2020-12-01 15:52:42,161+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-96) [11f50ce0] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9'
Did you notice this? When you run deploy, it also prompts you whether to pause execution after adding the host to the engine. Please reply 'Yes', and then, when it waits for you to remove a lock file, connect to the engine web admin as instructed, fix the required networks (by going to the host and setting NICs to all required networks, then trying to activate it), and then, after it's 'Up' (green), remove the lock file on the host so that deploy tries to continue. We recently changed the behavior around this, see also: https://bugzilla.redhat.com/show_bug.cgi?id=1893385
2020-12-01 15:52:48,474+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedScheduledExecutorService-engineScheduledTh readPool-Thread-12) [41688fc7] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9' 2020-12-01 15:52:53,734+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-6) [5fc7257] Host 'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9' 2020-12-01 15:52:54,567+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-13) [] Rerun VM 'f9249e06-237e-412c-91e9-7b0fa0b6ec2a'. Called from VDS 'aps-te62-mng.corprorate.it' 2020-12-01 15:52:54,676+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-361) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: external-HostedEngineLocal, Source: aps-te62-mng.corporate.it, Destination: aps-te61-mng.corporate.it).
Why is the playbook trying to migrate HostedEngineLocal from reinstalled 4.4.3 oVirt node to an existing one that is still running oVirt Node 4.3.x ?
Not sure - perhaps because the new host is non-operational. I agree this is probably not very useful behavior - perhaps you want to open a bug about this, and attach all relevant logs.
How can we manage this issue and proceed with the upgrade ?
See above.
[1] https://www.ovirt.org/documentation/upgrade_guide/#SHE_Upgrading_from_4-3 [2] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/htm...
Thanks is advance for support. Best regards
Good luck and best regards, -- Didi

I had issues with 4.4 and the network setup crashing. I got around this by setting IPV6_DISABLED=yes IPV6INIT=no and removing all other ipV6 entries in the ifcfg. After that the deploy worked just fine. I ran dnf autoremove vdsm -y before I re-ran the push from my engine.
participants (4)
-
Patrick Lomakin
-
Roberto Nunin
-
thilburn@generalpacific.com
-
Yedidyah Bar David