[ OST Failure Report ] [ oVirt Master (ovirt-ansible-manageiq) ] [
11-09-2018 ] [ metrics_bootstrap.metrics_and_log_collector ]
by Dafna Ron
Hi,
we have a failure in basic suite for test
metrics_bootstrap.metrics_and_log_collector.
We failed to configure the hosts for metrics collection which causes
failure in the test.
The change that is reported as cause of failure is:
https://github.com/oVirt/ovirt-ansible-manageiq/commit/
6697fc98f934beb9fece6d0224c0b013411b84e6
You can see the full logs here:
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3078/artifact...
error:
PLAY RECAP
*********************************************************************
lago-basic-suite-4-2-host-0 : ok=1 changed=0 unreachable=0
failed=1
lago-basic-suite-4-2-host-1 : ok=1 changed=0 unreachable=0
failed=1
localhost : ok=53 changed=24 unreachable=0
failed=0
2018-09-11 09:06:18,713::ssh.py::ssh::96::lago.ssh::DEBUG::Command d69e71be
on lago-basic-suite-4-2-engine errors:
[WARNING]: Could not match supplied host pattern, ignoring: metrics_store
2018-09-11
09:06:18,713::utils.py::_ret_via_queue::60::lago.utils::ERROR::Error while
running thread
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 58, in
_ret_via_queue
queue.put({'return': func()})
File
"/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/003_00_metrics_bootstrap.py",
line 54, in configure_metrics
' Exit code is %s' % result.code
File "/usr/lib/python2.7/site-packages/nose/tools/trivial.py", line 29,
in eq_
raise AssertionError(msg or "%r != %r" % (a, b))
AssertionError: Configuring ovirt machines for metrics failed. Exit code is
2
2018-09-11 09:06:38,897::ssh.py::ssh::81::lago.ssh::DEBUG::Command d648c70a
on lago-basic-suite-4-2-engine returned with 0
2018-09-11 09:06:38,897::ssh.py::ssh::89::lago.ssh::DEBUG::Command d648c70a
on lago-basic-suite-4-2-engine output:
This command will collect system configuration and diagnostic
information from this system.
6 years, 2 months
Change in barak-test[master]: jenkins integ test
by review@gerrit-staging.phx.ovirt.org
From Evgheni Dereveanchin <ederevea(a)redhat.com>:
Evgheni Dereveanchin has posted comments on this change. ( https://gerrit-staging.phx.ovirt.org/17 )
Change subject: jenkins integ test
......................................................................
Patch Set 1:
gerrit el7 migration test
--
To view, visit https://gerrit-staging.phx.ovirt.org/17
To unsubscribe, visit https://gerrit-staging.phx.ovirt.org/settings
Gerrit-Project: barak-test
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50acc69c8c5b6b3e03f3ced23d72c883ac4db2c2
Gerrit-Change-Number: 17
Gerrit-PatchSet: 1
Gerrit-Owner: Evgheni Dereveanchin <ederevea(a)redhat.com>
Gerrit-Reviewer: Evgheni Dereveanchin <ederevea(a)redhat.com>
Gerrit-Reviewer: Jenkins CI <infra(a)ovirt.org>
Gerrit-Comment-Date: Tue, 11 Sep 2018 12:37:22 +0000
Gerrit-HasComments: No
6 years, 2 months
issue in vdsm monitoring
by Dafna Ron
Hi,
I have been seeing random failures of tests in different projects caused by
vdsm monitoring.
I need someone from vdsm to please help debug this issue.
From what I can see, the test suspend/resume vm failed because we could not
query the status of the vm on the host.
you can see full log from failed tests here:
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/10208/arti...
Here are the errors that I can see in the vdsm which seem to suggest that
there is an issue getting stats on the vm which was suspended and caused
the failure of the test:
2018-09-11 00:40:05,896-0400 INFO (monitor/c1fe6e6)
[storage.StorageDomain] Removing remnants of deleted images [] (fileSD:734)
2018-09-11 00:40:07,957-0400 DEBUG (qgapoller/1) [vds] Not sending QEMU-GA
command 'guest-get-users' to vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc',
command is not supported (qemuguestagent:192)
2018-09-11 00:40:08,068-0400 DEBUG (periodic/3)
[virt.sampling.VMBulkstatsMonitor] sampled timestamp 4296118.49 elapsed
0.010 acquired True domains all (sampling:443)
2018-09-11 00:40:08,271-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
Calling 'Image.prepare' in bridge with {u'allowIllegal': True,
u'storagepoolID': u'e80a56d9-74da-498a-b010-4a9df287f11d', u'imageID':
u'd4c831e6-02d2-4d89-b516-0ec4597
5e024', u'volumeID': u'15b07af1-625b-42e3-b62a-8e7c7a120a56',
u'storagedomainID': u'f1744940-41b6-4d35-b7bf-870c4e07d995'} (__init__:329)
2018-09-11 00:40:10,846-0400 DEBUG (vmchannels) [virt.vm]
(vmId='8214433a-f233-4aaa-aeda-2ce1d31c78dc') Guest connection timed out
(guestagent:556)
2018-09-11 00:40:11,637-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer]
Calling 'Host.getStats' in bridge with {} (__init__:329)
2018-09-11 00:40:11,637-0400 INFO (jsonrpc/5) [api.host] START getStats()
from=::ffff:192.168.201.4,49184 (api:47)
2018-09-11 00:40:11,643-0400 DEBUG (jsonrpc/5) [root] cannot read eth0
speed (nic:42)
2018-09-11 00:40:11,645-0400 DEBUG (jsonrpc/5) [root] cannot read eth1
speed (nic:42)
2018-09-11 00:40:11,647-0400 DEBUG (jsonrpc/5) [root] cannot read eth2
speed (nic:42)
2018-09-11 00:40:11,649-0400 DEBUG (jsonrpc/5) [root] cannot read eth3
speed (nic:42)
2018-09-11 00:40:11,667-0400 INFO (jsonrpc/5) [api.host] FINISH getStats
error=[Errno 22] Invalid argument from=::ffff:192.168.201.4,49184 (api:51)
2018-09-11 00:40:11,667-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer]
Internal server error (__init__:350)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345,
in _handle_request
res = method(**params)
File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in
_dynamicMethod
result = fn(*methodArgs)
File "<string>", line 2, in getStats
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in
getStats
multipath=True)}
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49, in
get_stats
decStats = stats.produce(first_sample, last_sample)
File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71, in
produce
stats.update(get_interfaces_stats())
File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 153, in
get_interfaces_stats
return net_api.network_stats()
File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 63, in
network_stats
return netstats.report()
File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", line
31, in report
stats = link_stats.report()
File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", line
41, in report
speed = vlan.speed(i.device)
File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", line
36, in speed
dev_speed = nic.read_speed_using_sysfs(dev_name)
File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line
48, in read_speed_using_sysfs
s = int(f.read())
IOError: [Errno 22] Invalid argument
2018-09-11 00:40:11,669-0400 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC
call Host.getStats failed (error -32603) in 0.03 seconds (__init__:312)
2018-09-11 00:40:14,656-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
Calling 'Host.getAllVmStats' in bridge with {} (__init__:329)
2018-09-11 00:40:14,657-0400 INFO (jsonrpc/2) [api.host] START
getAllVmStats() from=::1,46772 (api:47)
2018-09-11 00:40:14,659-0400 INFO (jsonrpc/2) [api.host] FINISH
getAllVmStats return={'status': {'message': 'Done', 'code': 0},
'statsList': (suppressed)} from=::1,46772 (api:53)
2018-09-11 00:40:14,660-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
Return 'Host.getAllVmStats' in bridge with (suppressed) (__init__:356)
2018-09-11 00:40:14,661-0400 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
2018-09-11 00:40:14,674-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:329)
2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] START
getAllVmIoTunePolicies() from=::1,46772 (api:47)
2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] FINISH
getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0},
'io_tune_policies_dict': {'8214433a-f233-4aaa-aeda-2ce1d31c78dc':
{'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/e2570635-a798-4e9e-a0ea-88436d5b296f',
'name': 'vda'}]}}} from=::1,46772 (api:53)
2018-09-11 00:40:14,675-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
Return 'Host.getAllVmIoTunePolicies' in bridge with
{'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [], 'current_values':
[{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec':
0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L},
'path':
'/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/e2570635-a798-4e9e-a0ea-88436d5b296f',
'name': 'vda'}]}} (__init__:356)
2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
call Host.getAllVmIoTunePolicies succeeded in 0.00 seconds (__init__:312)
2018-09-11 00:40:16,118-0400 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer]
Calling 'Host.getAllVmStats' in bridge with {} (__init__:329)
2018-09-11 00:40:16,119-0400 INFO (jsonrpc/7) [api.host] START
getAllVmStats() from=::ffff:192.168.201.4,49184 (api:47)
2018-09-11 00:40:16,121-0400 INFO (jsonrpc/7) [api.host] FINISH
getAllVmStats return={'status': {'message': 'Done', 'code': 0},
'statsList': (suppressed)} from=::ffff:192.168.201.4,49184 (api:53)
2018-09-11 00:40:16,122-0400 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer]
Return 'Host.getAllVmStats' in bridge with (suppressed) (__init__:356)
2018-09-11 00:40:16,122-0400 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC
call Host.getAllVmStats succeeded in 0.01 seconds (__init__:312)
2018-09-11 00:40:17,956-0400 DEBUG (qgapoller/2) [vds] Not sending QEMU-GA
command 'guest-get-users' to vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc',
command is not supported (qemuguestagent:192)
2018-09-11 00:40:18,025-0400 DEBUG (periodic/3) [virt.periodic] Looking for
stale paused VMs (periodic:388)
I can also see this in Super vdsm which may be related but I an not 100%
sure its related to current issue:
MainProcess|jsonrpc/2::DEBUG::2018-09-11
00:33:20,012::cmdutils::159::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
ifup/on039d2d88d8ee4::DEBUG::2018-09-11
00:33:20,238::cmdutils::159::root::(exec_cmd) FAILED: <err> = 'Running
scope as unit
2f1f417d-85b9-46c0-b7d7-6082836daaae.scope.\n/etc/sysconfig/network-scripts/ifup-eth:
line 321: 20715 Terminated
/sbin/dhclient ${DHCLIENTARGS} ${DEVICE}\nDevice
"on039d2d88d8ee4" does not exist.\nCannot find device
"on039d2d88d8ee4"\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\n'; <rc> = 1
ifup/on039d2d88d8ee4::ERROR::2018-09-11
00:33:20,238::concurrent::202::root::(run) FINISH thread
<Thread(ifup/on039d2d88d8ee4, stopped daemon 140281550731008)> failed
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line
195, in run
ret = func(*args, **kwargs)
File
"/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
line 948, in _exec_ifup
_exec_ifup_by_name(iface.name, cgroup)
File
"/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
line 934, in _exec_ifup_by_name
raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
ConfigNetworkError: (29, '\n')
thanks,
Dafna
6 years, 2 months