On Tue, Dec 31, 2019 at 9:15 AM Sang Un Ahn <realapyo(a)gmail.com> wrote:
Hi,
I have figured it out that the root cause of the deployment failure is timing out while
the hosted engine was trying to connect to host vis SSH as shown in engine.log (located in
/var/log/ovirt-hosted-engine-setup/engine-logs-2019-12-31T06:34:38Z/ovirt-engine):
Nice catch. Sounds indeed more reasonable than the issue about
ovirt_host_facts (which is real, but still not failing us).
2019-12-31 15:43:06,082+09 ERROR [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand]
(default task-1) [f48796e7-a4c5-4c09-a70d-956f0c4249b4] Failed to establish session with
host 'alice-ovirt-01.sdfarm.kr': SSH connection timed out connecting to
'root(a)alice-ovirt-01.sdfarm.kr'
2019-12-31 15:43:06,085+09 WARN [org.ovirt.engine.core.bll.hostdeploy.AddVdsCommand]
(default task-1) [f48796e7-a4c5-4c09-a70d-956f0c4249b4] Validation of action
'AddVds' failed for user admin@internal-authz. Reasons:
VAR__ACTION__ADD,VAR__TYPE__HOST,$server
alice-ovirt-01.sdfarm.kr,VDS_CANNOT_CONNECT_TO_SERVER
2019-12-31 15:43:06,129+09 ERROR
[org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) []
Operation Failed: [Cannot add Host. Connecting to host via SSH has failed, verify that the
host is reachable (IP address, routable address etc.) You may refer to the engine.log file
for further details.]
The FQDN of hosted engine (alice-ovirt-engine.sdfarm.kr) is resolved as well as the host
(alice-ovirt-01.sdfarm.kr) and SSH is the one of services that are allowed by firewalld. I
believe the rules of firewalld is automatically configured during the deployment to work
with hosted engine and the host. Also root access is configured to be allowed at the first
stage of deployment.
I was just wondering how I can verify the hosted engine can access to the host at this
stage? Once it fails to deploy, the deployment script make all things rolled back (I
believe it cleans all up) and the vm-status of hosted-engine is un-deployed.
Are you sure? Can you check with ps if the qemu process is still up?
If so, you can try to ssh to it. IIUC you are still in the first stage
of deploy, where the VM is on the local network. You should be able to
find its IP address in the ansible logs, search for "local_vm_ip".
If it's dead: it might be that your engine tried to connect to the
host to an IP address it can't reach, either due to name resolution or
due to routing/firewalling etc. Both cockpit and the CLI ask you about
adding entries to /etc/hosts on the engine VM. Did you reply 'Yes'?
Best regards,
--
Didi