Forwarding to infra, as the email to infra-support didn't work.
---------- Forwarded message ---------
From: Yedidyah Bar David <didi@redhat.com>
Date: Wed, Feb 19, 2020 at 10:37 AM
Subject: ssh connection fails after restarting engine VM
To: <infra-support@ovirt.org>
Hi all,
Apparently my jira account was stuck in some state and all my emails
to infra-support from last years were dropped to /dev/null. Now saw
another case of a ticket I tried to open a month ago, and considered
updating it, so spent the time to actually login and see that I can't
find it anywhere. So copy/pasting below my original email.
The new case is:
https://jenkins.ovirt.org/job/ovirt-system-tests_he-node-ng-suite-4.3/341/
lago.log has:
2020-02-19 05:10:50,890::log_utils.py::__exit__::611::lago.prefix::INFO::
# [Thread-2] lago-he-node-ng-suite-4-3-host-1: [32mSuccess [0m (in
0:00:08)
2020-02-19 05:10:52,116::ssh.py::get_ssh_client::373::lago.ssh::DEBUG::SSH
error connecting to lago-he-node-ng-suite-4-3-engine: No existing
session
2020-02-19 05:10:52,116::ssh.py::get_ssh_client::381::lago.ssh::DEBUG::Still
got 0 tries for lago-he-node-ng-suite-4-3-engine
2020-02-19 05:10:53,117::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
task:e0b52607-6e65-4583-bf43-b615aa901cc7:Get ssh client for
lago-he-node-ng-suite-4-3-engine:
2020-02-19 05:10:53,232::log_utils.py::end_log_task::670::root::ERROR::
# [Thread-3] lago-he-node-ng-suite-4-3-engine: [31mERROR [0m (in
0:00:11)
2020-02-19 05:10:53,245::log_utils.py::__exit__::607::lago.prefix::DEBUG::
File "/usr/lib/python2.7/site-packages/lago/prefix.py", line 1526, in
_collect_artifacts
vm.collect_artifacts(path, ignore_nopath)
File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line
748, in collect_artifacts
ignore_nopath=ignore_nopath
File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line
468, in extract_paths
return self.provider.extract_paths(paths, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line
259, in extract_paths
format(self.vm.name())
2020-02-19 05:10:53,245::utils.py::_ret_via_queue::63::lago.utils::DEBUG::Error
while running thread Thread-3
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in
_ret_via_queue
queue.put({'return': func()})
File "/usr/lib/python2.7/site-packages/lago/prefix.py", line 1526,
in _collect_artifacts
vm.collect_artifacts(path, ignore_nopath)
File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line
748, in collect_artifacts
ignore_nopath=ignore_nopath
File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line
468, in extract_paths
return self.provider.extract_paths(paths, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/lago/plugins/vm.py", line
259, in extract_paths
format(self.vm.name())
ExtractPathError: Unable to extract paths from
lago-he-node-ng-suite-4-3-engine: unreachable with SSH
Original email I wrote (and was never received) follows:
Hi all,
See e.g. [1].
In lago.log [2]:
2020-01-19 05:38:06,052::log_utils.py::__exit__::611::ovirtlago.prefix::INFO::@
Run test: 008_restart_he_vm.py: [32mSuccess [0m (in 0:23:27)
...
2020-01-19 05:38:07,680::log_utils.py::__enter__::600::lago.prefix::INFO::
# [Thread-3] lago-he-basic-suite-4-3-engine: [0m [0m
...
2020-01-19 05:38:07,686::log_utils.py::__enter__::600::lago.ssh::DEBUG::start
task:170b4eaa-fbf5-48ca-b81a-4ddf0c9a3bd5:Get ssh client for
lago-he-basic-suite-4-3-engine:
...
2020-01-19 05:38:12,415::log_utils.py::__exit__::611::lago.prefix::INFO::
# [Thread-1] lago-he-basic-suite-4-3-host-0: [32mSuccess [0m (in
0:00:04)
2020-01-19 05:38:17,729::ssh.py::get_ssh_client::373::lago.ssh::DEBUG::SSH
error connecting to lago-he-basic-suite-4-3-engine: No existing
session
2020-01-19 05:38:17,730::ssh.py::get_ssh_client::381::lago.ssh::DEBUG::Still
got 0 tries for lago-he-basic-suite-4-3-engine
2020-01-19 05:38:18,731::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
task:170b4eaa-fbf5-48ca-b81a-4ddf0c9a3bd5:Get ssh client for
lago-he-basic-suite-4-3-engine:
2020-01-19 05:38:18,739::log_utils.py::end_log_task::670::root::ERROR::
# [Thread-3] lago-he-basic-suite-4-3-engine: [31mERROR [0m (in
0:00:11)
...
ExtractPathError: Unable to extract paths from
lago-he-basic-suite-4-3-engine: unreachable with SSH
I wonder what might "No existing session" mean. Perhaps paramiko
caches the connection, and after the engine VM is restarted it does
not try to connect again? Or something similar?
Searching the net, [3] looks similar, although I am definitely not
sure we do not already do this (or similar) in lago.
Anyway, instead of (perhaps) spending time on this, it might be
better/possible to explicitly expose some "closeconnection" function
in lago, to be used by OST after such an engine vm restart (or similar
cases).
[1] https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/324/
[2] https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/324/artifact/exported-artifacts/lago_logs/lago.log
[3] https://stackoverflow.com/questions/57508919/paramiko-ssh-exception-sshexception-no-existing-session
Thanks,
--
Didi
--
Didi
_______________________________________________
Infra mailing list -- infra@ovirt.org
To unsubscribe send an email to infra-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/I2FEUC5J2SM2YH64VVEPQUPWFOKBTGN5/