On Tue, Sep 11, 2018 at 5:28 PM, Milan Zamazal <mzamazal(a)redhat.com> wrote:
[I'm not sure whether the topic is still relevant now, after I
returned
from PTO.]
Yedidyah Bar David <didi(a)redhat.com> writes:
> On Thu, Sep 6, 2018 at 4:08 AM, <jenkins(a)jenkins.phx.ovirt.org> wrote:
>> Project:
http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/
>> Build:
http://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/137/
>
>> Build Number: 137
>> Build Status: Still Failing
>> Triggered By: Started by timer
>>
>> -------------------------------------
>> Changes Since Last Success:
>> -------------------------------------
>> Changes for Build #107
>> [Roy Golan] Introduce openshift-on-ovirt
>>
>>
>> Changes for Build #108
>> [Roy Golan] Introduce openshift-on-ovirt
>>
>>
>> Changes for Build #109
>> [Milan Zamazal] ovf_import test
>>
>> [Barak Korren] global_setup: Only install release RPMs on CentOS
>>
>> [Barak Korren] mock_runner: Set no_proxy for proxied configs
>>
>>
>> Changes for Build #110
>> [Milan Zamazal] ovf_import test
>>
>>
>> Changes for Build #111
>> [Milan Zamazal] ovf_import test
>>
>>
>> Changes for Build #112
>> [Milan Zamazal] ovf_import test
>>
>> [Eyal Edri] adding kubevirt/containerized-data-importer to stdci v2
>>
>>
>> Changes for Build #113
>> [Milan Zamazal] ovf_import test
>>
>>
>> Changes for Build #114
>> [Milan Zamazal] ovf_import test
>>
>>
>> Changes for Build #115
>> [Milan Zamazal] ovf_import test
>>
>>
>> Changes for Build #116
>> [Milan Zamazal] ovf_import test
>>
>>
>> Changes for Build #117
>> [Milan Zamazal] ovf_import test
>>
>>
>> Changes for Build #118
>> [Milan Zamazal] ovf_import test
>>
>> [Daniel Belenky] Remove oVirt logo from build summary
>>
>>
>> Changes for Build #119
>> [Milan Zamazal] ovf_import test
>>
>> [Evgheni Dereveanchin] Add infra-puppet project
>>
>> [Daniel Belenky] Added production gerrit to push-url-map
>>
>>
>> Changes for Build #120
>> [Ondra Machacek] ansible_suite: Fix ansible suite modules names
>>
>>
>> Changes for Build #121
>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>
>>
>> Changes for Build #122
>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>
>>
>> Changes for Build #123
>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>
>>
>> Changes for Build #124
>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>
>>
>> Changes for Build #125
>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>
>>
>> Changes for Build #126
>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>
>>
>> Changes for Build #127
>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>
>> [Viktor Mihajlovski] Re-enable s390x jobs
>>
>>
>> Changes for Build #128
>> [Anton Marchukov] Removed /root/fluentd from collected artifacts.
>>
>> [Ehud Yonasi] docker_cleanup: Fixed race condition with containers
>>
>>
>> Changes for Build #129
>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>
>> [Sandro Bonazzola] system-tests: drop sbonazzo from notified list
>>
>>
>> Changes for Build #130
>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>
>>
>> Changes for Build #131
>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>
>>
>> Changes for Build #132
>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>
>>
>> Changes for Build #133
>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>
>> [Daniel Belenky] stdci_runner: increase timeout to 3 hours
>>
>>
>> Changes for Build #134
>> [Milan Zamazal] Disable verify_backup_snapshot_removed for clusters < 4.2
>>
>>
>> Changes for Build #135
>> [Galit] Updated ovirt 4.2 repo file
>>
>>
>> Changes for Build #136
>> [Gal Ben Haim] Adding dr suite
>>
>>
>> Changes for Build #137
>> [Gal Ben Haim] Adding dr suite
>>
>> [Barak Korren] Remove the populate_mock function from mock_runner
>>
>> [Daniel Belenky] mock_runner: store shell cmd in a variable
>>
>> [Barak Korren] mock_runner: Added timeout param
>>
>> [Barak Korren] Make whitelist repo configurable via env vars
>>
>> [Daniel Belenky] stdci_runner: let mock_runner manage timeout
>>
>> [Greg Sheremeta] remove 4.1 change queue from ovirt-engine-nodejs config
>>
>>
>>
>>
>> -----------------
>> Failed Tests:
>> -----------------
>> 1 tests failed.
>> FAILED: 004_basic_sanity.verify_suspend_resume_vm0_40
>>
>> Error Message:
>> 0 != 1
>> -------------------- >> begin captured logging <<
--------------------
>> lago.ssh: DEBUG: start task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh
>> client for lago-compat-4-0-suite-master-host-0:
>> lago.ssh: DEBUG: end task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client
>> for lago-compat-4-0-suite-master-host-0:
>> lago.ssh: DEBUG: Running 414b4eae on lago-compat-4-0-suite-master-host-0: awk
>> 'BEGIN { n = 0; } $1 + " " + $2 > "IOError: [Errno"
&& $0 ~ /START
>> create\(.*hiberVolHandle/ { n = n + 1; } END { print n; }'
>> /var/log/vdsm/vdsm.log
>> lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 returned
with 0
>> lago.ssh: DEBUG: Command 414b4eae on lago-compat-4-0-suite-master-host-0 output:
>> 0
>
> This now happened to me too, when running OST manually on my machine,
> basic suite master.
>
> vdsm.log [1] has many occurrences of things like:
>
> 2018-09-05 21:02:49,276-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer]
> Calling 'Host.getStats' in bridge with {} (__init__:329)
> 2018-09-05 21:02:49,277-0400 INFO (jsonrpc/5) [api.host] START
> getStats() from=::ffff:192.168.200.2,40112 (api:47)
> 2018-09-05 21:02:49,281-0400 DEBUG (jsonrpc/5) [root] cannot read eth0
> speed (nic:42)
> 2018-09-05 21:02:49,282-0400 DEBUG (jsonrpc/5) [root] cannot read eth1
> speed (nic:42)
> 2018-09-05 21:02:49,283-0400 DEBUG (jsonrpc/5) [root] cannot read eth2
> speed (nic:42)
> 2018-09-05 21:02:49,284-0400 DEBUG (jsonrpc/5) [root] cannot read eth3
> speed (nic:42)
> 2018-09-05 21:02:49,289-0400 INFO (jsonrpc/5) [api.host] FINISH
> getStats error=[Errno 22] Invalid argument
> from=::ffff:192.168.200.2,40112 (api:51)
> 2018-09-05 21:02:49,289-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer]
> Internal server error (__init__:350)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
> 345, in _handle_request
> res = method(**params)
> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
> 202, in _dynamicMethod
> result = fn(*methodArgs)
> File "<string>", line 2, in getStats
> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
method
> ret = func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in
getStats
> multipath=True)}
> File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49,
> in get_stats
> decStats = stats.produce(first_sample, last_sample)
> File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71,
> in produce
> stats.update(get_interfaces_stats())
> File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line
> 153, in get_interfaces_stats
> return net_api.network_stats()
> File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line
> 63, in network_stats
> return netstats.report()
> File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py",
> line 31, in report
> stats = link_stats.report()
> File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py",
> line 41, in report
> speed = vlan.speed(i.device)
> File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py",
> line 36, in speed
> dev_speed = nic.read_speed_using_sysfs(dev_name)
> File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py",
> line 48, in read_speed_using_sysfs
> s = int(f.read())
> IOError: [Errno 22] Invalid argument
>
> The test verify_suspend_resume_vm0 seems to try to find a line in
> vdsm.log that appears _after_ the place it started suspend_resume_vm0.
> To search for this line, suspend_resume_vm0 does 'tail -1' on the file
> and keeps the timestamp (first two words). But with tracebacks such as
> above, instead of a timestamp it gets "IOError: [Errno".
>
> Not sure how to handle. Some options I can think of:
>
> 1. Make vdsm add a timestamp prefix to such tracebacks (and to
> everything, perhaps). This will fix the problem, and might be useful
> on its own, but can also be annoying.
I think it would be more annoying than useful. The initial timestamp
applies to the whole traceback so there would be no added information,
just making it possibly better readable to some parsers and less
readable to humans.
Agreed.
> 2. Change the logic in the test somehow, e.g.:
> - instead of 'tail -1', take e.g. 'tail -100' and search for the
last
> line that has a timestamp
That would be possible, but I actually like the fact that those
tracebacks were detected, although by side effect. If there's something
unexpected in the log, let's fail.
OK, see below.
> - check the file size and on the next one continue starting from this
> point (assuming OST will not rotate vdsm.log, is that safe?)
> - rotate vdsm.log prior to running the test (might be problematic, no
> idea) and then check whole new file
>
> 3. Perhaps patch vdsm to not log the traceback, e.g.
>
https://gerrit.ovirt.org/94197
Yes, fixing the problem is the right solution, there shouldn't be
tracebacks in the log without (good) reason.
I wasn't aware of the fact (?) that this is indeed the policy.
(
As a side note, in at least engine-setup, we do log tracebacks in
some cases and consider this ok. See also this, especially the last
few paragraphs:
https://ovirt.org/develop/developer-guide/engine/otopi/#logs
)
I think Petr Horáček is
working on it.
Very well. Cced you and him on above patch. If his fix is expected soon,
I don't mind abandoning mine (above).
> Probably all of these can be done as well.
>
> [1]
https://jenkins.ovirt.org/job/ovirt-system-tests_compat-4.0-suite-master/...
>
>>
>> --------------------- >> end captured logging <<
---------------------
>>
>> Stack Trace:
>> File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
>> testMethod()
>> File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in
runTest
>> self.test(*self.arg)
>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
142, in wrapped_test
>> test()
>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
60, in wrapper
>> return func(get_test_prefix(), *args, **kwargs)
>> File
"/home/jenkins/workspace/ovirt-system-tests_compat-4.0-suite-master/ovirt-system-tests/compat-4.0-suite-master/test-scenarios/004_basic_sanity.py",
line 1140, in verify_suspend_resume_vm0
>> nt.assert_equals(log_line_count('START create\(.*' + identifier), 1)
>> File "/usr/lib64/python2.7/unittest/case.py", line 553, in
assertEqual
>> assertion_func(first, second, msg=msg)
>> File "/usr/lib64/python2.7/unittest/case.py", line 546, in
_baseAssertEqual
>> raise self.failureException(msg)
>> '0 != 1\n-------------------- >> begin captured logging <<
--------------------\nlago.ssh: DEBUG: start task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get
ssh client for lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: end
task:a60df998-f4ba-4989-9e5f-f234f8409bd0:Get ssh client for
lago-compat-4-0-suite-master-host-0:\nlago.ssh: DEBUG: Running 414b4eae on
lago-compat-4-0-suite-master-host-0: awk \'BEGIN { n = 0; } $1 + " " + $2
> "IOError: [Errno" && $0 ~ /START create\\(.*hiberVolHandle/ { n = n
+ 1; } END { print n; }\' /var/log/vdsm/vdsm.log\nlago.ssh: DEBUG: Command 414b4eae on
lago-compat-4-0-suite-master-host-0 returned with 0\nlago.ssh: DEBUG: Command 414b4eae on
lago-compat-4-0-suite-master-host-0 output:\n 0\n\n--------------------- >> end
captured logging << ---------------------'
>> _______________________________________________
>> Infra mailing list -- infra(a)ovirt.org
>> To unsubscribe send an email to infra-leave(a)ovirt.org
>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
https://lists.ovirt.org/archives/list/infra@ovirt.org/message/PTTD52JZ7G7...
>>
--
Didi