[I'm not sure whether the topic is still relevant now, after I returned
from PTO.]
Yedidyah Bar David <didi(a)redhat.com> writes:
On Thu, Sep 6, 2018 at 4:08 AM,
<jenkins(a)jenkins.phx.ovirt.org> wrote:
> Project:
http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/
> Build:
http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/137/
> Build Number: 137
> Build Status: Still Failing
> Triggered By: Started by timer
>
> -------------------------------------
> Changes Since Last Success:
> -------------------------------------
> Changes for Build #107
> [Roy Golan] Introduce openshift-on-ovirt
>
>
> Changes for Build #108
> [Roy Golan] Introduce openshift-on-ovirt
>
>
> Changes for Build #109
> [Milan Zamazal] ovf_import test
>
> [Barak Korren] global_setup: Only install release RPMs on CentOS
>
> [Barak Korren] mock_runner: Set no_proxy for proxied configs
>
>
> Changes for Build #110
> [Milan Zamazal] ovf_import test
>
>
> Changes for Build #111
> [Milan Zamazal] ovf_import test
>
>
> Changes for Build #112
> [Milan Zamazal] ovf_import test
>
> [Eyal Edri] adding kubevirt/containerized-data-importer to stdci v2
>
>
> Changes for Build #113
> [Milan Zamazal] ovf_import test
>
>
> Changes for Build #114
> [Milan Zamazal] ovf_import test
>
>
> Changes for Build #115
> [Milan Zamazal] ovf_import test
>
>
> Changes for Build #116
> [Milan Zamazal] ovf_import test
>
>
> Changes for Build #117
> [Milan Zamazal] ovf_import test
>
>
> Changes for Build #118
> [Milan Zamazal] ovf_import test
>
> [Daniel Belenky] Remove oVirt logo from build summary
>
>
> Changes for Build #119
> [Milan Zamazal] ovf_import test
>
> [Evgheni Dereveanchin] Add infra-puppet project
>
> [Daniel Belenky] Added production gerrit to push-url-map
>
>
> Changes for Build #120
> [Ondra Machacek] ansible_suite: Fix ansible suite modules names
>
>
> Changes for Build #121
> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>
>
> Changes for Build #122
> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>
>
> Changes for Build #123
> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>
>
> Changes for Build #124
> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>
>
> Changes for Build #125
> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>
>
> Changes for Build #126
> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>
>
> Changes for Build #127
> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>
> [Viktor Mihajlovski] Re-enable s390x jobs
>
>
> Changes for Build #128
> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>
> [Ehud Yonasi] docker_cleanup: Fixed race condition with containers
>
>
> Changes for Build #129
> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>
> [Sandro Bonazzola] system-tests: drop sbonazzo from notified list
>
>
> Changes for Build #130
> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>
>
> Changes for Build #131
> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>
>
> Changes for Build #132
> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>
>
> Changes for Build #133
> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>
> [Daniel Belenky] stdci_runner: increase timeout to 3 hours
>
>
> Changes for Build #134
> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>
>
> Changes for Build #135
> [Galit] Updated ovirt 4.2 repo file
>
>
> Changes for Build #136
> [Gal Ben Haim] Adding dr suite
>
>
> Changes for Build #137
> [Gal Ben Haim] Adding dr suite
>
> [Barak Korren] Remove the populate_mock function from mock_runner
>
> [Daniel Belenky] mock_runner: store shell cmd in a variable
>
> [Barak Korren] mock_runner: Added timeout param
>
> [Barak Korren] Make whitelist repo configurable via env vars
>
> [Daniel Belenky] stdci_runner: let mock_runner manage timeout
>
> [Greg Sheremeta] remove 4.1 change queue from ovirt-engine-nodejs config
>
>
>
>
> -----------------
> Failed Tests:
> -----------------
> 1 tests failed.
> FAILED: 004_basic_sanity.verify_suspend_resume_vm0_40
>
> Error Message:
> 0 != 1
> -------------------- >> begin captured logging << --------------------
> lago.ssh: DEBUG: start task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh
> client for lago-compat-4-0-suite-master-host-0:
> lago.ssh: DEBUG: end task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client
> for lago-compat-4-0-suite-master-host-0:
> lago.ssh: DEBUG: Running 414b4eae on lago-compat-4-0-suite-master-host-0: awk
> 'BEGIN { n = 0; } $1 + " " + $2 > "IOError: [Errno"
&& $0 ~ /START
> create\(.*hiberVolHandle/ { n = n + 1; } END { print n; }'
> /var/log/vdsm/vdsm.log
> lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 returned
with 0
> lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 output:
> 0
This now happened to me too, when running OST manually on my machine,
basic suite master.
vdsm.log [1] has many occurrences of things like:
2018-09-05 21:02:49,276-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer]
Calling 'Host.getStats' in bridge with {} (__init__:329)
2018-09-05 21:02:49,277-0400 INFO (jsonrpc/5) [api.host] START
getStats() from=::ffff:192.168.200.2,40112 (api:47)
2018-09-05 21:02:49,281-0400 DEBUG (jsonrpc/5) [root] cannot read eth0
speed (nic:42)
2018-09-05 21:02:49,282-0400 DEBUG (jsonrpc/5) [root] cannot read eth1
speed (nic:42)
2018-09-05 21:02:49,283-0400 DEBUG (jsonrpc/5) [root] cannot read eth2
speed (nic:42)
2018-09-05 21:02:49,284-0400 DEBUG (jsonrpc/5) [root] cannot read eth3
speed (nic:42)
2018-09-05 21:02:49,289-0400 INFO (jsonrpc/5) [api.host] FINISH
getStats error=[Errno 22] Invalid argument
from=::ffff:192.168.200.2,40112 (api:51)
2018-09-05 21:02:49,289-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer]
Internal server error (__init__:350)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
345, in _handle_request
res = method(**params)
File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
202, in _dynamicMethod
result = fn(*methodArgs)
File "<string>", line 2, in getStats
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in getStats
multipath=True)}
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49,
in get_stats
decStats = stats.produce(first_sample, last_sample)
File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71,
in produce
stats.update(get_interfaces_stats())
File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line
153, in get_interfaces_stats
return net_api.network_stats()
File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line
63, in network_stats
return netstats.report()
File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py",
line 31, in report
stats = link_stats.report()
File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py",
line 41, in report
speed = vlan.speed(i.device)
File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py",
line 36, in speed
dev_speed = nic.read_speed_using_sysfs(dev_name)
File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py",
line 48, in read_speed_using_sysfs
s = int(f.read())
IOError: [Errno 22] Invalid argument
The test verify_suspend_resume_vm0 seems to try to find a line in
vdsm.log that appears _after_ the place it started suspend_resume_vm0.
To search for this line, suspend_resume_vm0 does 'tail -1' on the file
and keeps the timestamp (first two words). But with tracebacks such as
above, instead of a timestamp it gets "IOError: [Errno".
Not sure how to handle. Some options I can think of:
1. Make vdsm add a timestamp prefix to such tracebacks (and to
everything, perhaps). This will fix the problem, and might be useful
on its own, but can also be annoying.
I think it would be more annoying than useful. The initial timestamp
applies to the whole traceback so there would be no added information,
just making it possibly better readable to some parsers and less
readable to humans.
2. Change the logic in the test somehow, e.g.:
- instead of 'tail -1', take e.g. 'tail -100' and search for the last
line that has a timestamp
That would be possible, but I actually like the fact that those
tracebacks were detected, although by side effect. If there's something
unexpected in the log, let's fail.
- check the file size and on the next one continue starting from
this
point (assuming OST will not rotate vdsm.log, is that safe?)
- rotate vdsm.log prior to running the test (might be problematic, no
idea) and then check whole new file
3. Perhaps patch vdsm to not log the traceback, e.g.
https://gerrit.ovirt.org/94197
Yes, fixing the problem is the right solution, there shouldn't be
tracebacks in the log without (good) reason. I think Petr Horáček is
working on it.
> Probably all of these can be done as well.
>
> [1]
https://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/...
>
>>
>> --------------------- >> end captured logging <<
---------------------
>>
>> Stack Trace:
>> File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
>> testMethod()
>> File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in
runTest
>> self.test(*self.arg)
>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
142, in wrapped_test
>> test()
>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
60, in wrapper
>> return func(get_test_prefix(), *args, **kwargs)
>> File
"/home/jenkins/workspace/ovirt-system-tests_compat-4.0-suite-master/ovirt-system-tests/compat-4.0-suite-master/test-scenarios/004_basic_sanity.py",
line 1140, in verify_suspend_resume_vm0
>> nt.assert_equals(log_line_count('START create\(.*' + identifier), 1)
>> File "/usr/lib64/python2.7/unittest/case.py", line 553, in
assertEqual
>> assertion_func(first, second, msg=msg)
>> File "/usr/lib64/python2.7/unittest/case.py", line 546, in
_baseAssertEqual
>> raise self.failureException(msg)
>> '0 != 1\n-------------------- >> begin captured logging <<
--------------------\nlago.ssh: DEBUG: start task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get
ssh client for lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: end
task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for
lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: Running 414b4eae on
lago-compat-4-0-suite-master-host-0: awk \'BEGIN { n = 0; } $1 + " " + $2
> "IOError: [Errno" && $0 ~ /START create\\(.*hiberVolHandle/ { n = n
+ 1; } END { print n; }\' /var/log/vdsm/vdsm.log\nlago.ssh: DEBUG: Command 414b4eae on
lago-compat-4-0-suite-master-host-0 returned with 0\nlago.ssh: DEBUG: Command 414b4eae on
lago-compat-4-0-suite-master-host-0 output:\n 0\n\n--------------------- >> end
captured logging << ---------------------'
>> _______________________________________________
>> Infra mailing list -- infra(a)ovirt.org
>> To unsubscribe send an email to infra-leave(a)ovirt.org
>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
https://lists.ovirt.org/archives/list/infra@ovirt.org/message/PTTD52JZ7G7...
>>