Hi,

I have been seeing random failures of tests in different projects caused by vdsm monitoring.

I need someone from vdsm to please help debug this issue.

From what I can see, the test suspend/resume vm failed because we could not query the status of the vm on the host.

you can see full log from failed tests here:

https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/10208/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-master/post-004_basic_sanity.py/

Here are the errors that I can see in the vdsm which seem to suggest that there is an issue getting stats on the vm which was suspended and caused the failure of the test:



2018-09-11 00:40:05,896-0400 INFO  (monitor/c1fe6e6) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:734)
2018-09-11 00:40:07,957-0400 DEBUG (qgapoller/1) [vds] Not sending QEMU-GA command 'guest-get-users' to vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc', command is not supported (qemuguestagent:192)
2018-09-11 00:40:08,068-0400 DEBUG (periodic/3) [virt.sampling.VMBulkstatsMonitor] sampled timestamp 4296118.49 elapsed 0.010 acquired True domains all (sampling:443)
2018-09-11 00:40:08,271-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling 'Image.prepare' in bridge with {u'allowIllegal': True, u'storagepoolID': u'e80a56d9-74da-498a-b010-4a9df287f11d', u'imageID': u'd4c831e6-02d2-4d89-b516-0ec4597
5e024', u'volumeID': u'15b07af1-625b-42e3-b62a-8e7c7a120a56', u'storagedomainID': u'f1744940-41b6-4d35-b7bf-870c4e07d995'} (__init__:329)


2018-09-11 00:40:10,846-0400 DEBUG (vmchannels) [virt.vm] (vmId='8214433a-f233-4aaa-aeda-2ce1d31c78dc') Guest connection timed out (guestagent:556)
2018-09-11 00:40:11,637-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] Calling 'Host.getStats' in bridge with {} (__init__:329)
2018-09-11 00:40:11,637-0400 INFO  (jsonrpc/5) [api.host] START getStats() from=::ffff:192.168.201.4,49184 (api:47)
2018-09-11 00:40:11,643-0400 DEBUG (jsonrpc/5) [root] cannot read eth0 speed (nic:42)
2018-09-11 00:40:11,645-0400 DEBUG (jsonrpc/5) [root] cannot read eth1 speed (nic:42)
2018-09-11 00:40:11,647-0400 DEBUG (jsonrpc/5) [root] cannot read eth2 speed (nic:42)
2018-09-11 00:40:11,649-0400 DEBUG (jsonrpc/5) [root] cannot read eth3 speed (nic:42)
2018-09-11 00:40:11,667-0400 INFO  (jsonrpc/5) [api.host] FINISH getStats error=[Errno 22] Invalid argument from=::ffff:192.168.201.4,49184 (api:51)
2018-09-11 00:40:11,667-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer] Internal server error (__init__:350)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request
    res = method(**params)
  File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod
    result = fn(*methodArgs)
  File "<string>", line 2, in getStats
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in getStats
    multipath=True)}
  File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49, in get_stats
    decStats = stats.produce(first_sample, last_sample)
  File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71, in produce
    stats.update(get_interfaces_stats())
  File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 153, in get_interfaces_stats
    return net_api.network_stats()
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 63, in network_stats
    return netstats.report()
  File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", line 31, in report
    stats = link_stats.report()
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", line 41, in report
    speed = vlan.speed(i.device)
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", line 36, in speed
    dev_speed = nic.read_speed_using_sysfs(dev_name)
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line 48, in read_speed_using_sysfs
    s = int(f.read())
IOError: [Errno 22] Invalid argument
2018-09-11 00:40:11,669-0400 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getStats failed (error -32603) in 0.03 seconds (__init__:312)
2018-09-11 00:40:14,656-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmStats' in bridge with {} (__init__:329)
2018-09-11 00:40:14,657-0400 INFO  (jsonrpc/2) [api.host] START getAllVmStats() from=::1,46772 (api:47)
2018-09-11 00:40:14,659-0400 INFO  (jsonrpc/2) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,46772 (api:53)
2018-09-11 00:40:14,660-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] Return 'Host.getAllVmStats' in bridge with (suppressed) (__init__:356)
2018-09-11 00:40:14,661-0400 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
2018-09-11 00:40:14,674-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:329)
2018-09-11 00:40:14,675-0400 INFO  (jsonrpc/1) [api.host] START getAllVmIoTunePolicies() from=::1,46772 (api:47)
2018-09-11 00:40:14,675-0400 INFO  (jsonrpc/1) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/e2570635-a798-4e9e-a0ea-88436d5b296f', 'name': 'vda'}]}}} from=::1,46772 (api:53)
2018-09-11 00:40:14,675-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Return 'Host.getAllVmIoTunePolicies' in bridge with {'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/e2570635-a798-4e9e-a0ea-88436d5b296f', 'name': 'vda'}]}} (__init__:356)
2018-09-11 00:40:14,675-0400 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.00 seconds (__init__:312)
2018-09-11 00:40:16,118-0400 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmStats' in bridge with {} (__init__:329)
2018-09-11 00:40:16,119-0400 INFO  (jsonrpc/7) [api.host] START getAllVmStats() from=::ffff:192.168.201.4,49184 (api:47)
2018-09-11 00:40:16,121-0400 INFO  (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::ffff:192.168.201.4,49184 (api:53)
2018-09-11 00:40:16,122-0400 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer] Return 'Host.getAllVmStats' in bridge with (suppressed) (__init__:356)
2018-09-11 00:40:16,122-0400 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.01 seconds (__init__:312)
2018-09-11 00:40:17,956-0400 DEBUG (qgapoller/2) [vds] Not sending QEMU-GA command 'guest-get-users' to vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc', command is not supported (qemuguestagent:192)
2018-09-11 00:40:18,025-0400 DEBUG (periodic/3) [virt.periodic] Looking for stale paused VMs (periodic:388)

I can also see this in Super vdsm which may be related but I an not 100% sure its related to current issue:

MainProcess|jsonrpc/2::DEBUG::2018-09-11 00:33:20,012::cmdutils::159::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
ifup/on039d2d88d8ee4::DEBUG::2018-09-11 00:33:20,238::cmdutils::159::root::(exec_cmd) FAILED: <err> = 'Running scope as unit 2f1f417d-85b9-46c0-b7d7-6082836daaae.scope.\n/etc/sysconfig/network-scripts/ifup-eth: line 321: 20715 Terminated
             /sbin/dhclient ${DHCLIENTARGS} ${DEVICE}\nDevice "on039d2d88d8ee4" does not exist.\nCannot find device "on039d2d88d8ee4"\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
 exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
 exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d
8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevic
e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not
 exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\n'; <rc> = 1
ifup/on039d2d88d8ee4::ERROR::2018-09-11 00:33:20,238::concurrent::202::root::(run) FINISH thread <Thread(ifup/on039d2d88d8ee4, stopped daemon 140281550731008)> failed
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 948, in _exec_ifup
    _exec_ifup_by_name(iface.name, cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 934, in _exec_ifup_by_name
    raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
ConfigNetworkError: (29, '\n')


thanks,
Dafna