On Wed, Jul 15, 2020 at 6:21 PM Andrea Chierici
<andrea.chierici(a)cnaf.infn.it> wrote:
Dear all,
I think I finally understood the issue, even if I don't know how to fix it.
Trying to install a new HE from a backup I get the error:
"The host has been set in non_operational status, please check engine logs, more
info can be found in the engine logs, fix accordingly and re-deploy."
The host, not the hosted engine. This is more clear in another log:
Host <removed_for_privacy> is set to Non-Operational, it is missing the following
networks: 'iscsi_net,sgsi_iscsi,sgsi_priv,sgsi_vpn'
The fact is that those networks are present on the host:
# ip addr
<CUT>
26: sgsi_priv: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
group default qlen 1000
link/ether 90:e2:ba:63:2e:bc brd ff:ff:ff:ff:ff:ff
inet6 fe80::92e2:baff:fe63:2ebc/64 scope link
28: sgsi_vpn: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
group default qlen 1000
link/ether 90:e2:ba:63:2e:bc brd ff:ff:ff:ff:ff:ff
inet6 fe80::92e2:baff:fe63:2ebc/64 scope link
valid_lft forever preferred_lft forever
The other two are configured on ovirt but not configurable on bare metal system, indeed
if I issue "ip addr" on a production host I don't see those nets at all: I
am puzzled. The problem is definitely this one, can anyone provide any suggestion on how
to proceed?
Why is it complaining about sgsi_priv and sgsi_vpn that are not missing at all?
If you pass --restore-from-file, you should be prompted, at some
point, IMO (copying from the code, didn't test recently):
'Pause the execution after adding this host to the '
'engine?\n'
'You will be able to iteratively connect to '
'the restored engine in order to manually '
'review and remediate its configuration before '
'proceeding with the deployment:\nplease ensure that '
'all the datacenter hosts and storage domain are '
'listed as up or in maintenance mode before '
'proceeding.\nThis is normally not required when '
'restoring an up to date and coherent backup. '
'(@VALUES@)[@DEFAULT@]: '
Were you? If so, you can reply 'Yes', and then, later on, you should
get a message:
- name: Pause the execution to let the user interactively reconfigure the host
- name: Let the user connect to the bootstrap engine to manually
fix host configuration
msg: >-
You can now connect to {{ bootstrap_engine_url }} and
check the status of this host and
eventually remediate it, please continue only when the
host is listed as 'up'
- name: Pause execution until {{ he_setup_lock_file.path }} is
removed, delete it once ready to proceed
At this point, the deploy process will wait until you remove this
file, before continuing.
Then, you can login to the engine admin ui, change whatever needed on
the host - including
configuring networks or whatever, until you manage to bring it 'Up'.
Then remove the file.
Good luck and best regards,
Andrea
On 15/07/2020 08:33, Yedidyah Bar David wrote:
On Tue, Jul 14, 2020 at 6:04 PM Andrea Chierici
<andrea.chierici(a)cnaf.infn.it> wrote:
Hi,
thank you for your help.
I think this is not a critical failure, and is not what failed the restore.
Recently I tried the 4.3.11 beta and 4.4.1 and the error now is different:
[ INFO ] Upgrading CA\n[ ERROR ] Failed to execute stage 'Misc configuration':
(2, 'No such file or directory')\n[ INFO ] DNF Performing DNF transaction
rollback\n
This is part of 'engine-setup' output, which 'hosted-engine' runs inside
the engine VM. If you can access the engine VM, you can try finding more information in
/var/log/ovirt-engine/setup/* there. Otherwise, the hosted-engine deploy script might have
managed to get a copy to /var/log/ovirt-hosted-engine-setup/engine-logs*. Please
check/share these. Thanks.
Unfortunately the installation procedures when exiting, deletes the vm, hence I can't
log in.
Are you sure? Did you check with 'ps', searching qemu processes?
If it's still up, but still using a local IP address, you can find it
by searching the hosted-engine logs for 'local_vm_ip' and login there
from the host.
Here are the ERROR messages I got on the logs copied on the host:
engine.log:2020-07-08 15:05:04,178+02 ERROR
[org.ovirt.engine.core.bll.pm.FenceProxyLocator]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [45a7e7f3] Can
not run fence action on host '<erased_for_privacy>', no suitable proxy host
was found.
That's ok.
server.log:2020-07-08 15:09:23,081+02 ERROR [org.jboss.resteasy.resteasy_jaxrs.i18n]
(default task-1) RESTEASY002010: Failed to execute: javax.ws.rs.WebApplicationException:
HTTP 404 Not Found
server.log:2020-07-08 15:14:19,804+02 ERROR [org.jboss.resteasy.resteasy_jaxrs.i18n]
(default task-1) RESTEASY002010: Failed to execute: javax.ws.rs.WebApplicationException:
HTTP 404 Not Found
This probably indicates a problem, but I agree it's not very helpful.
grep: setup: Is a directory
Right - so please search inside it.
Also please check the hosted-engine deploy logs themselves.
Not very helpful.
I simply can't figure out what file is missing.
If, as a test, I try to install the HE without restoring the backup, the installation
goes smoothly to the end, but at that point I can't restore the backup, as far as I
can understand.
Another option is to do the restore manually. To find relevant information, search the
net for "enginevm_before_engine_setup".
Later I will give it a try.
Good luck and best regards,
--
Andrea Chierici - INFN-CNAF
Viale Berti Pichat 6/2, 40127 BOLOGNA
Office Tel: +39 051 2095463
SkypeID ataruz
--
--
Didi