Hi,
I have been seeing random failures of tests in different projects caused by
vdsm monitoring.
I need someone from vdsm to please help debug this issue.
From what I can see, the test suspend/resume vm failed because we could not
query the status of the vm on the host.
you can see full log from failed tests here:
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/10208/arti...
Here are the errors that I can see in the vdsm which seem to suggest that
there is an issue getting stats on the vm which was suspended and caused
the failure of the test:
2018-09-11 00:40:05,896-0400 INFO (monitor/c1fe6e6)
[storage.StorageDomain] Removing remnants of deleted images [] (fileSD:734)
2018-09-11 00:40:07,957-0400 DEBUG (qgapoller/1) [vds] Not sending QEMU-GA
command 'guest-get-users' to
vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc',
command is not supported (qemuguestagent:192)
2018-09-11 00:40:08,068-0400 DEBUG (periodic/3)
[virt.sampling.VMBulkstatsMonitor] sampled timestamp 4296118.49 elapsed
0.010 acquired True domains all (sampling:443)
2018-09-11 00:40:08,271-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
Calling 'Image.prepare' in bridge with {u'allowIllegal': True,
u'storagepoolID': u'e80a56d9-74da-498a-b010-4a9df287f11d',
u'imageID':
u'd4c831e6-02d2-4d89-b516-0ec4597
5e024', u'volumeID': u'15b07af1-625b-42e3-b62a-8e7c7a120a56',
u'storagedomainID': u'f1744940-41b6-4d35-b7bf-870c4e07d995'}
(__init__:329)
2018-09-11 00:40:10,846-0400 DEBUG (vmchannels) [virt.vm]
(vmId='8214433a-f233-4aaa-aeda-2ce1d31c78dc') Guest connection timed out
(guestagent:556)
2018-09-11 00:40:11,637-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer]
Calling 'Host.getStats' in bridge with {} (__init__:329)
2018-09-11 00:40:11,637-0400 INFO (jsonrpc/5) [api.host] START getStats()
from=::ffff:192.168.201.4,49184 (api:47)
2018-09-11 00:40:11,643-0400 DEBUG (jsonrpc/5) [root] cannot read eth0
speed (nic:42)
2018-09-11 00:40:11,645-0400 DEBUG (jsonrpc/5) [root] cannot read eth1
speed (nic:42)
2018-09-11 00:40:11,647-0400 DEBUG (jsonrpc/5) [root] cannot read eth2
speed (nic:42)
2018-09-11 00:40:11,649-0400 DEBUG (jsonrpc/5) [root] cannot read eth3
speed (nic:42)
2018-09-11 00:40:11,667-0400 INFO (jsonrpc/5) [api.host] FINISH getStats
error=[Errno 22] Invalid argument from=::ffff:192.168.201.4,49184 (api:51)
2018-09-11 00:40:11,667-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer]
Internal server error (__init__:350)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345,
in _handle_request
res = method(**params)
File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in
_dynamicMethod
result = fn(*methodArgs)
File "<string>", line 2, in getStats
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in
method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in
getStats
multipath=True)}
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49, in
get_stats
decStats = stats.produce(first_sample, last_sample)
File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71, in
produce
stats.update(get_interfaces_stats())
File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 153, in
get_interfaces_stats
return net_api.network_stats()
File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 63, in
network_stats
return netstats.report()
File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", line
31, in report
stats = link_stats.report()
File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", line
41, in report
speed = vlan.speed(i.device)
File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", line
36, in speed
dev_speed = nic.read_speed_using_sysfs(dev_name)
File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line
48, in read_speed_using_sysfs
s = int(f.read())
IOError: [Errno 22] Invalid argument
2018-09-11 00:40:11,669-0400 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC
call Host.getStats failed (error -32603) in 0.03 seconds (__init__:312)
2018-09-11 00:40:14,656-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
Calling 'Host.getAllVmStats' in bridge with {} (__init__:329)
2018-09-11 00:40:14,657-0400 INFO (jsonrpc/2) [api.host] START
getAllVmStats() from=::1,46772 (api:47)
2018-09-11 00:40:14,659-0400 INFO (jsonrpc/2) [api.host] FINISH
getAllVmStats return={'status': {'message': 'Done',
'code': 0},
'statsList': (suppressed)} from=::1,46772 (api:53)
2018-09-11 00:40:14,660-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
Return 'Host.getAllVmStats' in bridge with (suppressed) (__init__:356)
2018-09-11 00:40:14,661-0400 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
2018-09-11 00:40:14,674-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:329)
2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] START
getAllVmIoTunePolicies() from=::1,46772 (api:47)
2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] FINISH
getAllVmIoTunePolicies return={'status': {'message': 'Done',
'code': 0},
'io_tune_policies_dict': {'8214433a-f233-4aaa-aeda-2ce1d31c78dc':
{'policy': [], 'current_values': [{'ioTune':
{'write_bytes_sec': 0L,
'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/e2570635-a798-4e9e-a0ea-88436d5b296f',
'name': 'vda'}]}}} from=::1,46772 (api:53)
2018-09-11 00:40:14,675-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
Return 'Host.getAllVmIoTunePolicies' in bridge with
{'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [],
'current_values':
[{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L,
'read_iops_sec':
0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec':
0L},
'path':
'/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/e2570635-a798-4e9e-a0ea-88436d5b296f',
'name': 'vda'}]}} (__init__:356)
2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
call Host.getAllVmIoTunePolicies succeeded in 0.00 seconds (__init__:312)
2018-09-11 00:40:16,118-0400 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer]
Calling 'Host.getAllVmStats' in bridge with {} (__init__:329)
2018-09-11 00:40:16,119-0400 INFO (jsonrpc/7) [api.host] START
getAllVmStats() from=::ffff:192.168.201.4,49184 (api:47)
2018-09-11 00:40:16,121-0400 INFO (jsonrpc/7) [api.host] FINISH
getAllVmStats return={'status': {'message': 'Done',
'code': 0},
'statsList': (suppressed)} from=::ffff:192.168.201.4,49184 (api:53)
2018-09-11 00:40:16,122-0400 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer]
Return 'Host.getAllVmStats' in bridge with (suppressed) (__init__:356)
2018-09-11 00:40:16,122-0400 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC
call Host.getAllVmStats succeeded in 0.01 seconds (__init__:312)
2018-09-11 00:40:17,956-0400 DEBUG (qgapoller/2) [vds] Not sending QEMU-GA
command 'guest-get-users' to
vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc',
command is not supported (qemuguestagent:192)
2018-09-11 00:40:18,025-0400 DEBUG (periodic/3) [virt.periodic] Looking for
stale paused VMs (periodic:388)
I can also see this in Super vdsm which may be related but I an not 100%
sure its related to current issue:
MainProcess|jsonrpc/2::DEBUG::2018-09-11
00:33:20,012::cmdutils::159::root::(exec_cmd) SUCCESS: <err> = '';
<rc> = 0
ifup/on039d2d88d8ee4::DEBUG::2018-09-11
00:33:20,238::cmdutils::159::root::(exec_cmd) FAILED: <err> = 'Running
scope as unit
2f1f417d-85b9-46c0-b7d7-6082836daaae.scope.\n/etc/sysconfig/network-scripts/ifup-eth:
line 321: 20715 Terminated
/sbin/dhclient ${DHCLIENTARGS} ${DEVICE}\nDevice
"on039d2d88d8ee4" does not exist.\nCannot find device
"on039d2d88d8ee4"\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does
not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does
not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does
not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d8ee4"
does not exist.\nDevice "on039d2d88d8ee4" does not exist.\n'; <rc> =
1
ifup/on039d2d88d8ee4::ERROR::2018-09-11
00:33:20,238::concurrent::202::root::(run) FINISH thread
<Thread(ifup/on039d2d88d8ee4, stopped daemon 140281550731008)> failed
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line
195, in run
ret = func(*args, **kwargs)
File
"/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
line 948, in _exec_ifup
_exec_ifup_by_name(iface.name, cgroup)
File
"/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
line 934, in _exec_ifup_by_name
raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
ConfigNetworkError: (29, '\n')
thanks,
Dafna