For the failed job, the engine didn't even try to deploy on host-1:
Martin, do you know what could be the reason for that?
I can see in the logs for both successful and unsuccessful basic-suite-4.3 runs, that there is no 'ntpdate' on host-1:
2019-03-25 10:14:46,350::ssh.py::ssh::58::lago.ssh::DEBUG::Running d0c49b54 on lago-basic-suite-4-3-host-1: ntpdate -4 lago-basic-suite-4-3-engine 2019-03-25 10:14:46,383::ssh.py::ssh::81::lago.ssh::DEBUG::Command d0c49b54 on lago-basic-suite-4-3-host-1 returned with 127 2019-03-25 10:14:46,384::ssh.py::ssh::96::lago.ssh::DEBUG::Command d0c49b54 on lago-basic-suite-4-3-host-1 errors: bash: ntpdate: command not found
On host-0 everything is ok:
2019-03-25 10:14:46,917::ssh.py::ssh::58::lago.ssh::DEBUG::Running d11b2a64 on lago-basic-suite-4-3-host-0: ntpdate -4 lago-basic-suite-4-3-engine 2019-03-25 10:14:53,088::ssh.py::ssh::81::lago.ssh::DEBUG::Command d11b2a64 on lago-basic-suite-4-3-host-0 returned with 0 2019-03-25 10:14:53,088::ssh.py::ssh::89::lago.ssh::DEBUG::Command d11b2a64 on lago-basic-suite-4-3-host-0 output: 25 Mar 06:14:53 ntpdate[6646]: adjust time server 192.168.202.2 offset 0.017150 sec
Still fails, now on a different component. ( ovirt-web-ui-extentions )
On Fri, Mar 22, 2019 at 3:59 PM Dan Kenigsberg <danken@redhat.com> wrote:
_______________________________________________
On Fri, Mar 22, 2019 at 3:21 PM Marcin Sobczyk <msobczyk@redhat.com> wrote:
Dafna,
in 'verify_add_hosts' we specifically wait for single host to be up with a timeout:
144 up_hosts = hosts_service.list(search='datacenter={} AND status=up'.format(DC_NAME)) 145 if len(up_hosts): 146 return TrueThe log files say, that it took ~50 secs for one of the hosts to be up (seems reasonable) and no timeout is being reported.
Just after running 'verify_add_hosts', we run 'add_master_storage_domain', which calls '_hosts_in_dc' function.
That function does the exact same check, but it fails:113 hosts = hosts_service.list(search='datacenter={} AND status=up'.format(dc_name)) 114 if hosts: 115 if random_host: 116 return random.choice(hosts)I don't think it is relevant to our current failure; but I consider random_host=True as a bad practice. As if we do not have enough moving parts, we are adding intentional randomness. Reproducibility is far more important than coverage - particularly for a shared system test like OST.
117 else: 118 return sorted(hosts, key=lambda host: host.name) 119 raise RuntimeError('Could not find hosts that are up in DC %s' % dc_name)I'm also not able to reproduce this issue locally on my server. The investigation continues...
I think that it would be fair to take the filtering by host state out of Engine and into the test, where we can easily log the current status of each host. Then we'd have better understanding on the next failure.
_______________________________________________
On 3/22/19 1:17 PM, Marcin Sobczyk wrote:
Hi,
sure, I'm on it - it's weird though, I did ran 4.3 basic suite for this patch manually and everything was ok.
On 3/22/19 1:05 PM, Dafna Ron wrote:
Hi,
We are failing branch 4.3 for test: 002_bootstrap.add_master_storage_domain
It seems that in one of the hosts, the vdsm is not startingthere is nothing in vdsm.log or in supervdsm.log
CQ identified this patch as the suspected root cause:
https://gerrit.ovirt.org/#/c/98748/ - vdsm: client: Add support for flow id
Milan, Marcin, can you please have a look?
full logs:
the only error I can see is about host not being up (makes sense as vdsm is not running)
Stacktrace
File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 142, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py", line 417, in add_master_storage_domain add_iscsi_storage_domain(prefix) File "/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py", line 561, in add_iscsi_storage_domain host=_random_host_from_dc(api, DC_NAME), File "/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py", line 122, in _random_host_from_dc return _hosts_in_dc(api, dc_name, True) File "/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py", line 119, in _hosts_in_dc raise RuntimeError('Could not find hosts that are up in DC %s' % dc_name) 'Could not find hosts that are up in DC test-dc\n-------------------- >> begin captured logging << --------------------\nlago.ssh: DEBUG: start task:937bdea7-a2a3-47ad-9383-36647ea37ddf:Get ssh client for lago-basic-suite-4-3-engine:\nlago.ssh: DEBUG: end task:937bdea7-a2a3-47ad-9383-36647ea37ddf:Get ssh client for lago-basic-suite-4-3-engine:\nlago.ssh: DEBUG: Running c07b5ee2 on lago-basic-suite-4-3-engine: cat /root/multipath.txt\nlago.ssh: DEBUG: Command c07b5ee2 on lago-basic-suite-4-3-engine returned with 0\nlago.ssh: DEBUG: Command c07b5ee2 on lago-basic-suite-4-3-engine output:\n 3600140516f88cafa71243648ea218995\n360014053e28f60001764fed9978ec4b3\n360014059edc777770114a6484891dcf1\n36001405d93d8585a50d43a4ad0bd8d19\n36001405e31361631de14bcf87d43e55a\n\n-----------
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/J4NCHXTK5ZYLXWW36DZKAUL5DN7WBNW4/
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/ULS4OKU2YZFDQT5EDFYKLW5GFA52YZ7U/
--
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)