
On Tue, Dec 31, 2019 at 9:15 AM Sang Un Ahn <realapyo@gmail.com> wrote:
Hi,
I have figured it out that the root cause of the deployment failure is timing out while the hosted engine was trying to connect to host vis SSH as shown in engine.log (located in /var/log/ovirt-hosted-engine-setup/engine-logs-2019-12-31T06:34:38Z/ovirt-engine):
Nice catch. Sounds indeed more reasonable than the issue about ovirt_host_facts (which is real, but still not failing us).
2019-12-31 15:43:06,082+09 ERROR [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (default task-1) [f48796e7-a4c5-4c09-a70d-956f0c4249b4] Failed to establish session with host 'alice-ovirt-01.sdfarm.kr': SSH connection timed out connecting to 'root@alice-ovirt-01.sdfarm.kr' 2019-12-31 15:43:06,085+09 WARN [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand] (default task-1) [f48796e7-a4c5-4c09-a70d-956f0c4249b4] Validation of action 'AddVds' failed for user admin@internal-authz. Reasons: VAR__ACTION__ADD,VAR__TYPE__HOST,$server alice-ovirt-01.sdfarm.kr,VDS_CANNOT_CONNECT_TO_SERVER 2019-12-31 15:43:06,129+09 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) [] Operation Failed: [Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details.]
The FQDN of hosted engine (alice-ovirt-engine.sdfarm.kr) is resolved as well as the host (alice-ovirt-01.sdfarm.kr) and SSH is the one of services that are allowed by firewalld. I believe the rules of firewalld is automatically configured during the deployment to work with hosted engine and the host. Also root access is configured to be allowed at the first stage of deployment.
I was just wondering how I can verify the hosted engine can access to the host at this stage? Once it fails to deploy, the deployment script make all things rolled back (I believe it cleans all up) and the vm-status of hosted-engine is un-deployed.
Are you sure? Can you check with ps if the qemu process is still up? If so, you can try to ssh to it. IIUC you are still in the first stage of deploy, where the VM is on the local network. You should be able to find its IP address in the ansible logs, search for "local_vm_ip". If it's dead: it might be that your engine tried to connect to the host to an IP address it can't reach, either due to name resolution or due to routing/firewalling etc. Both cockpit and the CLI ask you about adding entries to /etc/hosts on the engine VM. Did you reply 'Yes'? Best regards, -- Didi