
[I'm not sure whether the topic is still relevant now, after I returned from PTO.] Yedidyah Bar David <didi@redhat.com> writes:
On Thu, Sep 6, 2018 at 4:08 AM, <jenkins@jenkins.phx.ovirt.org> wrote:
Project: http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/ Build: http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/137/
Build Number: 137 Build Status: Still Failing Triggered By: Started by timer
------------------------------------- Changes Since Last Success: ------------------------------------- Changes for Build #107 [Roy Golan] Introduce openshift-on-ovirt
Changes for Build #108 [Roy Golan] Introduce openshift-on-ovirt
Changes for Build #109 [Milan Zamazal] ovf_import test
[Barak Korren] global_setup: Only install release RPMs on CentOS
[Barak Korren] mock_runner: Set no_proxy for proxied configs
Changes for Build #110 [Milan Zamazal] ovf_import test
Changes for Build #111 [Milan Zamazal] ovf_import test
Changes for Build #112 [Milan Zamazal] ovf_import test
[Eyal Edri] adding kubevirt/containerized-data-importer to stdci v2
Changes for Build #113 [Milan Zamazal] ovf_import test
Changes for Build #114 [Milan Zamazal] ovf_import test
Changes for Build #115 [Milan Zamazal] ovf_import test
Changes for Build #116 [Milan Zamazal] ovf_import test
Changes for Build #117 [Milan Zamazal] ovf_import test
Changes for Build #118 [Milan Zamazal] ovf_import test
[Daniel Belenky] Remove oVirt logo from build summary
Changes for Build #119 [Milan Zamazal] ovf_import test
[Evgheni Dereveanchin] Add infra-puppet project
[Daniel Belenky] Added production gerrit to push-url-map
Changes for Build #120 [Ondra Machacek] ansible_suite: Fix ansible suite modules names
Changes for Build #121 [Anton Marchukov] Removed /root/fluentd from collected artifacts.
Changes for Build #122 [Anton Marchukov] Removed /root/fluentd from collected artifacts.
Changes for Build #123 [Anton Marchukov] Removed /root/fluentd from collected artifacts.
Changes for Build #124 [Anton Marchukov] Removed /root/fluentd from collected artifacts.
Changes for Build #125 [Anton Marchukov] Removed /root/fluentd from collected artifacts.
Changes for Build #126 [Anton Marchukov] Removed /root/fluentd from collected artifacts.
Changes for Build #127 [Anton Marchukov] Removed /root/fluentd from collected artifacts.
[Viktor Mihajlovski] Re-enable s390x jobs
Changes for Build #128 [Anton Marchukov] Removed /root/fluentd from collected artifacts.
[Ehud Yonasi] docker_cleanup: Fixed race condition with containers
Changes for Build #129 [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
[Sandro Bonazzola] system-tests: drop sbonazzo from notified list
Changes for Build #130 [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
Changes for Build #131 [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
Changes for Build #132 [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
Changes for Build #133 [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
[Daniel Belenky] stdci_runner: increase timeout to 3 hours
Changes for Build #134 [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
Changes for Build #135 [Galit] Updated ovirt 4.2 repo file
Changes for Build #136 [Gal Ben Haim] Adding dr suite
Changes for Build #137 [Gal Ben Haim] Adding dr suite
[Barak Korren] Remove the populate_mock function from mock_runner
[Daniel Belenky] mock_runner: store shell cmd in a variable
[Barak Korren] mock_runner: Added timeout param
[Barak Korren] Make whitelist repo configurable via env vars
[Daniel Belenky] stdci_runner: let mock_runner manage timeout
[Greg Sheremeta] remove 4.1 change queue from ovirt-engine-nodejs config
----------------- Failed Tests: ----------------- 1 tests failed. FAILED: 004_basic_sanity.verify_suspend_resume_vm0_40
Error Message: 0 != 1 -------------------- >> begin captured logging << -------------------- lago.ssh: DEBUG: start task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for lago-compat-4-0-suite-master-host-0: lago.ssh: DEBUG: end task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for lago-compat-4-0-suite-master-host-0: lago.ssh: DEBUG: Running 414b4eae on lago-compat-4-0-suite-master-host-0: awk 'BEGIN { n = 0; } $1 + " " + $2 > "IOError: [Errno" && $0 ~ /START create\(.*hiberVolHandle/ { n = n + 1; } END { print n; }' /var/log/vdsm/vdsm.log lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 returned with 0 lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 output: 0
This now happened to me too, when running OST manually on my machine, basic suite master.
vdsm.log [1] has many occurrences of things like:
2018-09-05 21:02:49,276-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] Calling 'Host.getStats' in bridge with {} (__init__:329) 2018-09-05 21:02:49,277-0400 INFO (jsonrpc/5) [api.host] START getStats() from=::ffff:192.168.200.2,40112 (api:47) 2018-09-05 21:02:49,281-0400 DEBUG (jsonrpc/5) [root] cannot read eth0 speed (nic:42) 2018-09-05 21:02:49,282-0400 DEBUG (jsonrpc/5) [root] cannot read eth1 speed (nic:42) 2018-09-05 21:02:49,283-0400 DEBUG (jsonrpc/5) [root] cannot read eth2 speed (nic:42) 2018-09-05 21:02:49,284-0400 DEBUG (jsonrpc/5) [root] cannot read eth3 speed (nic:42) 2018-09-05 21:02:49,289-0400 INFO (jsonrpc/5) [api.host] FINISH getStats error=[Errno 22] Invalid argument from=::ffff:192.168.200.2,40112 (api:51) 2018-09-05 21:02:49,289-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer] Internal server error (__init__:350) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod result = fn(*methodArgs) File "<string>", line 2, in getStats File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in getStats multipath=True)} File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49, in get_stats decStats = stats.produce(first_sample, last_sample) File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71, in produce stats.update(get_interfaces_stats()) File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 153, in get_interfaces_stats return net_api.network_stats() File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 63, in network_stats return netstats.report() File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", line 31, in report stats = link_stats.report() File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", line 41, in report speed = vlan.speed(i.device) File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", line 36, in speed dev_speed = nic.read_speed_using_sysfs(dev_name) File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line 48, in read_speed_using_sysfs s = int(f.read()) IOError: [Errno 22] Invalid argument
The test verify_suspend_resume_vm0 seems to try to find a line in vdsm.log that appears _after_ the place it started suspend_resume_vm0. To search for this line, suspend_resume_vm0 does 'tail -1' on the file and keeps the timestamp (first two words). But with tracebacks such as above, instead of a timestamp it gets "IOError: [Errno".
Not sure how to handle. Some options I can think of:
1. Make vdsm add a timestamp prefix to such tracebacks (and to everything, perhaps). This will fix the problem, and might be useful on its own, but can also be annoying.
I think it would be more annoying than useful. The initial timestamp applies to the whole traceback so there would be no added information, just making it possibly better readable to some parsers and less readable to humans.
2. Change the logic in the test somehow, e.g.: - instead of 'tail -1', take e.g. 'tail -100' and search for the last line that has a timestamp
That would be possible, but I actually like the fact that those tracebacks were detected, although by side effect. If there's something unexpected in the log, let's fail.
- check the file size and on the next one continue starting from this point (assuming OST will not rotate vdsm.log, is that safe?) - rotate vdsm.log prior to running the test (might be problematic, no idea) and then check whole new file
3. Perhaps patch vdsm to not log the traceback, e.g. https://gerrit.ovirt.org/94197
Yes, fixing the problem is the right solution, there shouldn't be tracebacks in the log without (good) reason. I think Petr Horáček is working on it.
Probably all of these can be done as well.
[1] https://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/137...
--------------------- >> end captured logging << ---------------------
Stack Trace: File "/usr/lib64/python2.7/unittest/case.py", line 369, in run testMethod() File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 142, in wrapped_test test() File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in wrapper return func(get_test_prefix(), *args, **kwargs) File "/home/jenkins/workspace/ovirt-system-tests_compat-4.0-suite-master/ovirt-system-tests/compat-4.0-suite-master/test-scenarios/004_basic_sanity.py", line 1140, in verify_suspend_resume_vm0 nt.assert_equals(log_line_count('START create\(.*' + identifier), 1) File "/usr/lib64/python2.7/unittest/case.py", line 553, in assertEqual assertion_func(first, second, msg=msg) File "/usr/lib64/python2.7/unittest/case.py", line 546, in _baseAssertEqual raise self.failureException(msg) '0 != 1\n-------------------- >> begin captured logging << --------------------\nlago.ssh: DEBUG: start task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: end task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: Running 414b4eae on lago-compat-4-0-suite-master-host-0: awk \'BEGIN { n = 0; } $1 + " " + $2 > "IOError: [Errno" && $0 ~ /START create\\(.*hiberVolHandle/ { n = n + 1; } END { print n; }\' /var/log/vdsm/vdsm.log\nlago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 returned with 0\nlago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 output:\n 0\n\n--------------------- >> end captured logging << ---------------------' _______________________________________________ Infra mailing list -- infra@ovirt.org To unsubscribe send an email to infra-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/PTTD52JZ7G72DX...