Hi Didi,

Thank you for the message. 

I have checked the ansible log and found that 'local_vm_ip' is not set as I wanted. It seems that the ip was assigned randomly not respecting the static configuration along with the answers of questionnaire. The ip of VM is set to one within virbr0 network which is 192.168.222.1/24 and it looks this network is to be configured while the deployment because the network is changed everytime when I started over. 

In anyway, I can ssh to the VM with this ip and ssh back to the host from the VM. In /etc/hosts of both VM and host, I can see the ip and hostname are added by the deployment setup. 

I was just wondering why the ip is not set to the one that I wanted to have for the VM. During the deployment questionnaire, I chose 'static' for VM ip and it is the one in public network where ovirtmgmt bridge should be set. I believe the ovirtmgmt bridge should be brought up while the deployment. At the stage of hosted-engine deployment, the host has virbr0 and virbr0-nic other than em1, em2. Note that the host has 2 NICs one for public and the other for private. 

Best regards,
Sang-Un

On Dec 31, 2019, at 16:44, Yedidyah Bar David <didi@redhat.com> wrote:

On Tue, Dec 31, 2019 at 9:15 AM Sang Un Ahn <realapyo@gmail.com> wrote:

Hi,

I have figured it out that the root cause of the deployment failure is timing out while the hosted engine was trying to connect to host vis SSH as shown in engine.log (located in /var/log/ovirt-hosted-engine-setup/engine-logs-2019-12-31T06:34:38Z/ovirt-engine):

Nice catch. Sounds indeed more reasonable than the issue about
ovirt_host_facts (which is real, but still not failing us).


2019-12-31 15:43:06,082+09 ERROR [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (default task-1) [f48796e7-a4c5-4c09-a70d-956f0c4249b4] Failed to establish session with host 'alice-ovirt-01.sdfarm.kr': SSH connection timed out connecting to 'root@alice-ovirt-01.sdfarm.kr'
2019-12-31 15:43:06,085+09 WARN  [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (default task-1) [f48796e7-a4c5-4c09-a70d-956f0c4249b4] Validation of action 'AddVds' failed for user admin@internal-authz. Reasons: VAR__ACTION__ADD,VAR__TYPE__HOST,$server alice-ovirt-01.sdfarm.kr,VDS_CANNOT_CONNECT_TO_SERVER
2019-12-31 15:43:06,129+09 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) [] Operation Failed: [Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details.]

The FQDN of hosted engine (alice-ovirt-engine.sdfarm.kr) is resolved as well as the host (alice-ovirt-01.sdfarm.kr) and SSH is the one of services that are allowed by firewalld. I believe the rules of firewalld is automatically configured during the deployment to work with hosted engine and the host. Also root access is configured to be allowed at the first stage of deployment.

I was just wondering how I can verify the hosted engine can access to the host at this stage? Once it fails to deploy, the deployment script make all things rolled back (I believe it cleans all up) and the vm-status of hosted-engine is un-deployed.

Are you sure? Can you check with ps if the qemu process is still up?
If so, you can try to ssh to it. IIUC you are still in the first stage
of deploy, where the VM is on the local network. You should be able to
find its IP address in the ansible logs, search for "local_vm_ip".

If it's dead: it might be that your engine tried to connect to the
host to an IP address it can't reach, either due to name resolution or
due to routing/firewalling etc. Both cockpit and the CLI ask you about
adding entries to /etc/hosts on the engine VM. Did you reply 'Yes'?

Best regards,
-- 
Didi