On Tue, Dec 1, 2020 at 5:47 PM Roberto Nunin <robnunin(a)gmail.com> wrote:
We are following both oVirt upgrade guide [1] and RHV 4.4 upgrade guide [2].
aps-te62-mng.corporate.it ---> host resinatlled with oVirt Node 4.4.3
aps-te61-mng.corporate.it ---> host where previous ovirt-engine 4.3.10 VM was running
when backup was taken.
hosted-engine --deploy --restore-from-file=<path to file> fails with following
errors in ovirt-hosted-engine-setup:
2020-12-01 15:53:37,534+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed":
false, "msg": "The host has been set in non_operational status, please
check engine logs, more info can be found in the engine logs, fix accordingly and
re-deploy."}
2020-12-01 15:56:30,414+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed":
false, "msg": "The system may not be provisioned according to the playbook
results: please check the logs for the issue, fix accordingly or re-deploy from
scratch.\n"}
2020-12-01 15:56:33,731+0100 ERROR otopi.context context._executeMethod:154 Failed to
execute stage 'Closing up': Failed executing ansible-playbook
2020-12-01 15:57:08,663+0100 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! =>
{"changed": false, "msg": "Failed to connect to the host via ssh:
ssh: connect to host itte1lv51-mng.comifar.it port 22: Connection timed out",
"skip_reason": "Host localhost is unreachable",
"unreachable": true}
2020-12-01 15:58:22,179+0100 ERROR otopi.plugins.gr_he_common.core.misc
misc._terminate:167 Hosted Engine deployment failed: please check the logs for the issue,
fix accordingly or re-deploy from scratch.
while within the HostedEngineLocal engine.log:
2020-12-01 15:52:42,161+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(EE-ManagedThreadFactory-engine-Thread-96) [11f50ce0] Host
'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following
networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9'
Did you notice this?
When you run deploy, it also prompts you whether to pause execution
after adding the host to the engine. Please reply 'Yes', and then,
when it waits for you to remove a lock file, connect to the engine web
admin as instructed, fix the required networks (by going to the host
and setting NICs to all required networks, then trying to activate
it), and then, after it's 'Up' (green), remove the lock file on the
host so that deploy tries to continue.
We recently changed the behavior around this, see also:
https://bugzilla.redhat.com/show_bug.cgi?id=1893385
2020-12-01 15:52:48,474+01 ERROR
[org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(EE-ManagedScheduledExecutorService-engineScheduledTh
readPool-Thread-12) [41688fc7] Host 'aps-te62-mng.corporate.it' is set to
Non-Operational, it is missing the following networks:
'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9'
2020-12-01 15:52:53,734+01 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-6) [5fc7257] Host
'aps-te62-mng.corporate.it' is set to Non-Operational, it is missing the following
networks: 'migration,traffic_11,traffic_202,traffic_5,traffic_555,traffic_9'
2020-12-01 15:52:54,567+01 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-13) []
Rerun VM 'f9249e06-237e-412c-91e9-7b0fa0b6ec2a'. Called from VDS
'aps-te62-mng.corprorate.it'
2020-12-01 15:52:54,676+01 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-361) [] EVENT_ID:
VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: external-HostedEngineLocal,
Source: aps-te62-mng.corporate.it, Destination: aps-te61-mng.corporate.it).
Why is the playbook trying to migrate HostedEngineLocal from reinstalled 4.4.3 oVirt node
to an existing one that is still running oVirt Node 4.3.x ?
Not sure - perhaps because the new host is non-operational. I agree
this is probably not very useful behavior - perhaps you want to open a
bug about this, and attach all relevant logs.
How can we manage this issue and proceed with the upgrade ?
See above.
Good luck and best regards,
--
Didi