OK, it seems to fail when I'm using Jumbo frames everywhere.
Works will with mtu 1500.

On Mon, Oct 23, 2017 at 10:55 PM, Martin Perina <mperina@redhat.com> wrote:

On Mon, Oct 23, 2017 at 9:38 PM, Roy Golan <rgolan@redhat.com> wrote:

On Mon, 23 Oct 2017 at 21:51 Martin Perina <mperina@redhat.com> wrote:
On Mon, Oct 23, 2017 at 6:21 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
I'm failing to install hosts on o-s-t on Master.
What worries me is not that I'm failing (though it is a bit of a surprise, perhaps something I've done?), but that there are no logs around it.

​Please see my response below, but which logs are you ​

/var/log/ovirt-engine/host-deploy is empty and so is /var/log/ovirt-engine/ansible.

Logs for both part of host installation (host-deploy and ansible) are​
​in /var/log/ovirt-engine/host-deploy, but they are created once each part successfully started.

All I'm seeing:
Host lago-basic-suite-master-host-0 installation failed. Unexpected connection termination.

2017-10-23 12:16:33,041-04 WARN  [org.apache.sshd.client.session.ClientSessionImpl] (sshd-SshClient[346b54f3]-nio2-thread-2) Exception caught: java.io.IOException: Connection timed out
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) [rt.jar:1.8.0_151]
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) [rt.jar:1.8.0_151]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) [rt.jar:1.8.0_151]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) [rt.jar:1.8.0_151]
        at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishWrite(UnixAsynchronousSocketChannelImpl.java:582) [rt.jar:1.8.0_151]
        at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:190) [rt.jar:1.8.0_151]
        at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213) [rt.jar:1.8.0_151]
        at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293) [rt.jar:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151]


2017-10-23 12:16:33,046-04 DEBUG [org.ovirt.engine.core.uutils.ssh.SSHClient] (EE-ManagedThreadFactory-engine-Thread-1) [83a00e1] Executed: 'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXX
XXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALO
2017-10-23 12:16:33,056-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [83a00e1] Error during deploy dialog

​So this means that SSH connection to the host using which the host deploy should be started failed. The reason is above in server.log, that SSH connection timed out. This error appears even before host-deploy is executed, that's we don't have any host-deploy log created.
2017-10-23 12:16:33,057-04 DEBUG [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [83a00e1] execute leave
2017-10-23 12:16:33,057-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-1) [83a00e1] Error during host lago-basic-suite-master-host-0 install
2017-10-23 12:16:33,065-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [83a00e1] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has occurred during installation of Host lago-basic-suite-master-host-0: Unexpected connection termination.

​Here is an ERROR event in audit_log for above issue.

perhaps reposync was stalling the rpm installation/download and this triggered the ssh timeout?

​As the host-deploy log hasn't been created, I'd say that this is the initial connection timeout, so engine couldn't connect to the host at all. We would need to investigate the host if it was some firewall issues or sshd was not running or something else.

2017-10-23 12:16:33,065-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-1) [83a00e1] Error during host lago-basic-suite-master-host-0 install, preferring first exception: Unexpected connection termination
2017-10-23 12:16:33,065-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [83a00e1] Host installation failed for host 'c4138375-aa53-4c36-8907-306803ae4282', 'lago-basic-suite-master-host-0': Unexpected connection termination

Devel mailing list
Devel mailing list