On Sun, Jan 26, 2020 at 8:45 PM Fredy Sanchez <fredy.sanchez@modmed.com> wrote:
Hi all,

[root@bric-ovirt-1 ~]# cat /etc/*release*
CentOS Linux release 7.7.1908 (Core)
[root@bric-ovirt-1 ~]# yum info ovirt-engine-appliance
Installed Packages
Name        : ovirt-engine-appliance
Arch        : x86_64
Version     : 4.3
Release     : 20191121.1.el7
Size        : 1.0 G
Repo        : installed
From repo   : ovirt-4.3

Same situation as https://bugzilla.redhat.com/show_bug.cgi?id=1787267. The error message almost everywhere is some red herring message about ansible

You are right that it's misleading, but were the errors below the only ones you got from ansible?
 
[ INFO  ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", "version": "2.13"}]}
[ INFO  ] TASK [ovirt.hosted_engine_setup : Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200126170315-req4qb.log

But the "real" problem seems to be SSH related, as you can see below

Indeed
 
[root@bric-ovirt-1 ovirt-engine]# pwd
/var/log/ovirt-hosted-engine-setup/engine-logs-2020-01-26T17:19:28Z/ovirt-engine
[root@bric-ovirt-1 ovirt-engine]# grep -i error engine.log
2020-01-26 17:26:50,178Z ERROR [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (default task-1) [2341fd23-f0c7-4f1c-ad48-88af20c2d04b] Failed to establish session with host 'bric-ovirt-1.corp.modmed.com': SSH session closed during connection 'root@bric-ovirt-1.corp.modmed.com'

Please check/share the entire portion of engine.log, from where it starts to try to ssh til it gives up.
 
2020-01-26 17:26:50,205Z ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) [] Operation Failed: [Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details.]

The funny thing is that the engine can indeed ssh to bric-ovirt-1 (physical host). See below

[root@bric-ovirt-1 ovirt-hosted-engine-setup]# cat /etc/hosts
192.168.1.52 bric-ovirt-engine.corp.modmed.com # temporary entry added by hosted-engine-setup for the bootstrap VM
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.130.0.50 bric-ovirt-engine bric-ovirt-engine.corp.modmed.com
10.130.0.51 bric-ovirt-1 bric-ovirt-1.corp.modmed.com
10.130.0.52 bric-ovirt-2 bric-ovirt-2.corp.modmed.com
10.130.0.53 bric-ovirt-3 bric-ovirt-3.corp.modmed.com
192.168.0.1 bric-ovirt-1gluster bric-ovirt-1gluster.corp.modmed.com
192.168.0.2 bric-ovirt-2gluster bric-ovirt-2gluster.corp.modmed.com
192.168.0.3 bric-ovirt-3gluster bric-ovirt-3gluster.corp.modmed.com
[root@bric-ovirt-1 ovirt-hosted-engine-setup]#

[root@bric-ovirt-1 ~]# ssh 192.168.1.52
Last login: Sun Jan 26 17:55:20 2020 from 192.168.1.1
[root@bric-ovirt-engine ~]#
[root@bric-ovirt-engine ~]#
[root@bric-ovirt-engine ~]# ssh bric-ovirt-1
Password:
Password:
Last failed login: Sun Jan 26 18:17:16 UTC 2020 from 192.168.1.52 on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Sun Jan 26 18:16:46 2020
###################################################################
# UNAUTHORIZED ACCESS TO THIS SYSTEM IS PROHIBITED                #
#                                                                 #
# This system is the property of Modernizing Medicine, Inc.       #
# It is for authorized Company business purposes only.            #
# All connections are monitored and recorded.                     #
# Disconnect IMMEDIATELY if you are not an authorized user!       #
###################################################################
[root@bric-ovirt-1 ~]#
[root@bric-ovirt-1 ~]#
[root@bric-ovirt-1 ~]# exit
logout
Connection to bric-ovirt-1 closed.
[root@bric-ovirt-engine ~]#
[root@bric-ovirt-engine ~]#
[root@bric-ovirt-engine ~]# ssh bric-ovirt-1.corp.modmed.com
Password:
Last login: Sun Jan 26 18:17:22 2020 from 192.168.1.52
###################################################################
# UNAUTHORIZED ACCESS TO THIS SYSTEM IS PROHIBITED                #
#                                                                 #
# This system is the property of Modernizing Medicine, Inc.       #
# It is for authorized Company business purposes only.            #
# All connections are monitored and recorded.                     #
# Disconnect IMMEDIATELY if you are not an authorized user!       #
###################################################################

Can you please try this, from the engine machine:

ssh root@bric-ovirt-1.corp.modmed.com true

If this outputs the above "PROHIBITED" note, you'll have to configure your
scripts etc. to not output it on non-interactive shells. Otherwise, this
confuses the engine - it can't really distinguish between your own output
and the output of the commands it runs there.
 
[root@bric-ovirt-1 ~]# exit
logout
Connection to bric-ovirt-1.corp.modmed.com closed.
[root@bric-ovirt-engine ~]#
[root@bric-ovirt-engine ~]#
[root@bric-ovirt-engine ~]# exit
logout
Connection to 192.168.1.52 closed.
[root@bric-ovirt-1 ~]#

So, what gives? I already disabled all ssh security in the physical host, and whitelisted all potential IPs from the engine using firewalld. Regardless, the engine can ssh to the host as root :-(. Is there maybe another user that's used for the "Wait for the host to be  up" SSH test? Yes, I tried both passwords and certificates.

No, that's root. You can also see that in the log.
 


Maybe what's really happening is that engine is not getting the right IP? bric-ovirt-engine is supposed to get 10.130.0.50, instead it never gets there, getting 192.168.1.52 from virbr0 in bric-ovirt-1. See below.

That's by design. For details, if interested, see "Hosted Engine 4.3 Deep Dive" presentation:

https://www.ovirt.org/community/get-involved/resources/slide-decks.html
 

 --== HOST NETWORK CONFIGURATION ==--
          Please indicate the gateway IP address [10.130.0.1]
          Please indicate a nic to set ovirtmgmt bridge on: (p4p1, p5p1) [p4p1]:
--== VM CONFIGURATION ==--
You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:17:1d:f8]:
          How should the engine VM network be configured (DHCP, Static)[DHCP]? static
          Please enter the IP address to be used for the engine VM []: 10.130.0.50
[ INFO  ] The engine VM will be configured to use 10.130.0.50/25
          Please provide a comma-separated list (max 3) of IP addresses of domain name servers for the engine VM
          Engine VM DNS (leave it empty to skip) [10.130.0.2,10.130.0.3]:
          Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?
          Note: ensuring that this host could resolve the engine VM hostname is still up to you
          (Yes, No)[No] Yes

[root@bric-ovirt-1 ~]# ip addr
3: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0a:f7:f1:c6:80 brd ff:ff:ff:ff:ff:ff
    inet 10.130.0.51/25 brd 10.130.0.127 scope global noprefixroute p4p1
       valid_lft forever preferred_lft forever
28: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:25:7b:6f brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 brd 192.168.1.255 scope global virbr0
       valid_lft forever preferred_lft forever
29: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:25:7b:6f brd ff:ff:ff:ff:ff:ff
30: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:17:1d:f8 brd ff:ff:ff:ff:ff:ff

The newly created engine VM does remain up even after hosted-engine --deploy errors out; just at the wrong IP. I haven't been able to make it get its real IP.

This happens only after the real engine VM is created, connected to the correct network.

The current engine vm you see is a libvirt VM connected to its default (internal) network.
 
At any rate, thank you very much for taking a look at my very long email. Any and all help would be really appreciated.

Good luck and best regards,
--
Didi