Thanks Dominic.
Can you please review the network tests to fix this race between the two tests?


On Tue, Sep 11, 2018 at 9:53 PM, Dominik Holler <dholler@redhat.com> wrote:
On Tue, 11 Sep 2018 17:18:15 +0200
Dominik Holler <dholler@redhat.com> wrote:

> On Tue, 11 Sep 2018 12:22:21 +0100
> Dafna Ron <dron@redhat.com> wrote:
>
> > Can someone take ownership to fix it?
> >   
>
> I will take ownership and track the fix.
>
>
> > On Tue, Sep 11, 2018 at 12:04 PM, Petr Horacek <phoracek@redhat.com>
> > wrote:
> >   
> > > vdsm.log is the same issue, supervdsm.log seems unrelated.
> > >
> > > 2018-09-11 11:59 GMT+02:00 Dominik Holler <dholler@redhat.com>:
> > >   
> > >> On Tue, 11 Sep 2018 10:26:02 +0100
> > >> Dafna Ron <dron@redhat.com> wrote:
> > >>   
> > >> > Hi,
> > >> >
> > >> > I have been seeing random failures of tests in different
> > >> > projects caused by vdsm monitoring.
> > >> >
> > >> > I need someone from vdsm to please help debug this issue.
> > >> >   
> > >>
> > >> Petr, is this the same problem like yesterday in
> > >>
> > >> [ovirt-devel] failure in ost test - Invalid argument - help in
> > >> debugging issue
> > >>   
> > >> > From what I can see, the test suspend/resume vm failed because
> > >> > we could not query the status of the vm on the host.
> > >> >
> > >> > you can see full log from failed tests here:
> > >> >
> > >> > https://jenkins.ovirt.org/job/ovirt-master_change-queue-test   
> > >> er/10208/artifact/basic-suite.el7.x86_64/test_logs/basic-
> > >> suite-master/post-004_basic_sanity.py/   
> > >> >
> > >> > Here are the errors that I can see in the vdsm which seem to
> > >> > suggest that there is an issue getting stats on the vm which
> > >> > was suspended and caused the failure of the test:
> > >> >
> > >> >
> > >> >
> > >> > 2018-09-11 00:40:05,896-0400 INFO  (monitor/c1fe6e6)
> > >> > [storage.StorageDomain] Removing remnants of deleted images []
> > >> > (fileSD:734) 2018-09-11 00:40:07,957-0400 DEBUG (qgapoller/1)
> > >> > [vds] Not sending QEMU-GA command 'guest-get-users' to
> > >> > vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc', command is not
> > >> > supported (qemuguestagent:192) 2018-09-11 00:40:08,068-0400
> > >> > DEBUG (periodic/3) [virt.sampling.VMBulkstatsMonitor] sampled
> > >> > timestamp 4296118.49 elapsed 0.010 acquired True domains all
> > >> > (sampling:443) 2018-09-11 00:40:08,271-0400 DEBUG (jsonrpc/1)
> > >> > [jsonrpc.JsonRpcServer] Calling 'Image.prepare' in bridge with
> > >> > {u'allowIllegal': True, u'storagepoolID':
> > >> > u'e80a56d9-74da-498a-b010-4a9df287f11d', u'imageID':
> > >> > u'd4c831e6-02d2-4d89-b516-0ec4597 5e024', u'volumeID':
> > >> > u'15b07af1-625b-42e3-b62a-8e7c7a120a56', u'storagedomainID':
> > >> > u'f1744940-41b6-4d35-b7bf-870c4e07d995'} (__init__:329)
> > >> >
> > >> >
> > >> > 2018-09-11 00:40:10,846-0400 DEBUG (vmchannels) [virt.vm]
> > >> > (vmId='8214433a-f233-4aaa-aeda-2ce1d31c78dc') Guest connection
> > >> > timed out (guestagent:556)
> > >> > 2018-09-11 00:40:11,637-0400 DEBUG (jsonrpc/5)
> > >> > [jsonrpc.JsonRpcServer] Calling 'Host.getStats' in bridge with
> > >> > {} (__init__:329) 2018-09-11 00:40:11,637-0400 INFO
> > >> > (jsonrpc/5) [api.host] START getStats()
> > >> > from=::ffff:192.168.201.4,49184 (api:47) 2018-09-11
> > >> > 00:40:11,643-0400 DEBUG (jsonrpc/5) [root] cannot read eth0
> > >> > speed (nic:42) 2018-09-11 00:40:11,645-0400 DEBUG (jsonrpc/5)
> > >> > [root] cannot read eth1 speed (nic:42)
> > >> > 2018-09-11 00:40:11,647-0400 DEBUG (jsonrpc/5) [root] cannot
> > >> > read eth2 speed (nic:42)
> > >> > 2018-09-11 00:40:11,649-0400 DEBUG (jsonrpc/5) [root] cannot
> > >> > read eth3 speed (nic:42)
> > >> > 2018-09-11 00:40:11,667-0400 INFO  (jsonrpc/5) [api.host]
> > >> > FINISH getStats error=[Errno 22] Invalid argument
> > >> > from=::ffff:192.168.201.4,49184 (api:51) 2018-09-11
> > >> > 00:40:11,667-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer]
> > >> > Internal server error (__init__:350) Traceback (most recent
> > >> > call last): File
> > >> > "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
> > >> > 345, in _handle_request res = method(**params)
> > >> >   File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py",
> > >> > line 202, in _dynamicMethod
> > >> >     result = fn(*methodArgs)
> > >> >   File "<string>", line 2, in getStats
> > >> >   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py",
> > >> > line 49, in method
> > >> >     ret = func(*args, **kwargs)
> > >> >   File "/usr/lib/python2.7/site-packages/vdsm/API.py", line
> > >> > 1407, in getStats
> > >> >     multipath=True)}
> > >> >   File "/usr/lib/python2.7/site-packages/vdsm/host/api.py",
> > >> > line 49, in get_stats
> > >> >     decStats = stats.produce(first_sample, last_sample)
> > >> >   File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py",
> > >> > line 71, in produce
> > >> >     stats.update(get_interfaces_stats())
> > >> >   File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py",
> > >> > line 153, in get_interfaces_stats
> > >> >     return net_api.network_stats()
> > >> >   File "/usr/lib/python2.7/site-packages/vdsm/network/api.py",
> > >> > line 63, in network_stats
> > >> >     return netstats.report()
> > >> >   File
> > >> > "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py",
> > >> > line 31, in report stats = link_stats.report()
> > >> >   File
> > >> > "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py",
> > >> > line 41, in report speed = vlan.speed(i.device)
> > >> >   File
> > >> > "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py",
> > >> > line 36, in speed dev_speed =
> > >> > nic.read_speed_using_sysfs(dev_name) File
> > >> > "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py",
> > >> > line 48, in read_speed_using_sysfs s = int(f.read())
> > >> > IOError: [Errno 22] Invalid argument
> > >> > 2018-09-11 00:40:11,669-0400 INFO  (jsonrpc/5)
> > >> > [jsonrpc.JsonRpcServer] RPC call Host.getStats failed (error
> > >> > -32603) in 0.03 seconds (__init__:312) 2018-09-11
> > >> > 00:40:14,656-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
> > >> > Calling 'Host.getAllVmStats' in bridge with {} (__init__:329)
> > >> > 2018-09-11 00:40:14,657-0400 INFO (jsonrpc/2) [api.host] START
> > >> > getAllVmStats() from=::1,46772 (api:47) 2018-09-11
> > >> > 00:40:14,659-0400 INFO  (jsonrpc/2) [api.host] FINISH
> > >> > getAllVmStats return={'status': {'message': 'Done', 'code': 0},
> > >> > 'statsList': (suppressed)} from=::1,46772 (api:53) 2018-09-11
> > >> > 00:40:14,660-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
> > >> > Return 'Host.getAllVmStats' in bridge with (suppressed)
> > >> > (__init__:356) 2018-09-11 00:40:14,661-0400 INFO  (jsonrpc/2)
> > >> > [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded
> > >> > in 0.00 seconds (__init__:312) 2018-09-11 00:40:14,674-0400
> > >> > DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling
> > >> > 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:329)
> > >> > 2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] START
> > >> > getAllVmIoTunePolicies() from=::1,46772 (api:47) 2018-09-11
> > >> > 00:40:14,675-0400 INFO  (jsonrpc/1) [api.host] FINISH
> > >> > getAllVmIoTunePolicies return={'status': {'message': 'Done',
> > >> > 'code': 0}, 'io_tune_policies_dict':
> > >> > {'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [],
> > >> > 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
> > >> > 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec':
> > >> > 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
> > >> > '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-   
> > >> 870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/
> > >> e2570635-a798-4e9e-a0ea-88436d5b296f',   
> > >> > 'name': 'vda'}]}}} from=::1,46772 (api:53) 2018-09-11
> > >> > 00:40:14,675-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
> > >> > Return 'Host.getAllVmIoTunePolicies' in bridge with
> > >> > {'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [],
> > >> > 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
> > >> > 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec':
> > >> > 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
> > >> > '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-   
> > >> 870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/
> > >> e2570635-a798-4e9e-a0ea-88436d5b296f',   
> > >> > 'name': 'vda'}]}} (__init__:356) 2018-09-11 00:40:14,675-0400
> > >> > INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call
> > >> > Host.getAllVmIoTunePolicies succeeded in 0.00 seconds
> > >> > (__init__:312) 2018-09-11 00:40:16,118-0400 DEBUG (jsonrpc/7)
> > >> > [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmStats' in bridge
> > >> > with {} (__init__:329) 2018-09-11 00:40:16,119-0400 INFO
> > >> > (jsonrpc/7) [api.host] START getAllVmStats()
> > >> > from=::ffff:192.168.201.4,49184 (api:47) 2018-09-11
> > >> > 00:40:16,121-0400 INFO  (jsonrpc/7) [api.host] FINISH
> > >> > getAllVmStats return={'status': {'message': 'Done', 'code': 0},
> > >> > 'statsList': (suppressed)} from=::ffff:192.168.201.4,49184
> > >> > (api:53) 2018-09-11 00:40:16,122-0400 DEBUG (jsonrpc/7)
> > >> > [jsonrpc.JsonRpcServer] Return 'Host.getAllVmStats' in bridge
> > >> > with (suppressed) (__init__:356) 2018-09-11 00:40:16,122-0400
> > >> > INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call
> > >> > Host.getAllVmStats succeeded in 0.01 seconds (__init__:312)
> > >> > 2018-09-11 00:40:17,956-0400 DEBUG (qgapoller/2) [vds] Not
> > >> > sending QEMU-GA command 'guest-get-users' to
> > >> > vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc', command is not
> > >> > supported (qemuguestagent:192) 2018-09-11 00:40:18,025-0400
> > >> > DEBUG (periodic/3) [virt.periodic] Looking for stale paused VMs
> > >> > (periodic:388)
> > >> >
> > >> > I can also see this in Super vdsm which may be related but I an
> > >> > not 100% sure its related to current issue:
> > >> >
> > >> > MainProcess|jsonrpc/2::DEBUG::2018-09-11
> > >> > 00:33:20,012::cmdutils::159::root::(exec_cmd) SUCCESS: <err> =
> > >> > ''; <rc> = 0 ifup/on039d2d88d8ee4::DEBUG::2018-09-11
> > >> > 00:33:20,238::cmdutils::159::root::(exec_cmd) FAILED: <err> =
> > >> > 'Running scope as unit
> > >> > 2f1f417d-85b9-46c0-b7d7-6082836daaae.scope.\n/etc/sysconfig/   
> > >> network-scripts/ifup-eth:   
> > >> > line 321: 20715 Terminated
> > >> >              /sbin/dhclient ${DHCLIENTARGS} ${DEVICE}\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nCannot find device
> > >> > "on039d2d88d8ee4"\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d 8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevic e "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d 8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevic e "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d 8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevic e
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4"
> > >> > does not exist.\nDevice "on039d2d88d8ee4" does not
> > >> > exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice
> > >> > "on039d2d88d8ee4" does not exist.\n'; <rc> = 1
> > >> > ifup/on039d2d88d8ee4::ERROR::2018-09-11
> > >> > 00:33:20,238::concurrent::202::root::(run) FINISH thread
> > >> > <Thread(ifup/on039d2d88d8ee4, stopped daemon 140281550731008)>
> > >> > failed Traceback (most recent call last): File
> > >> > "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py",
> > >> > line 195, in run ret = func(*args, **kwargs) File
> > >> > "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
> > >> > line 948, in _exec_ifup _exec_ifup_by_name(iface.name, cgroup)
> > >> >   File
> > >> > "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py",
> > >> > line 934, in _exec_ifup_by_name
> > >> >     raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out
> > >> > else '') ConfigNetworkError: (29, '\n')
> > >> >
> > >> >


What happens in
basic-suite-master/test-scenarios/003_basic_networking.py :
modify_host_0_ip_to_dhcp set VM_NETWORK to use DHCP,
so vdsm is calling ifup, which tries to configure via DHCP in the
background.
The next test detach_vm_network_from_host_0 removes the network, while
ifup still hangs in dhcp configuration, because in this VLAN there is
no DHCP server.
So from my point of view this error in supervdsm.log is legitimate,
because VDSM was advised to configure via DHCP, which fails because the
interface is removed during dhcp configuration.
I did not recognize that this triggeres any chained errors.

> > >> > thanks,
> > >> > Dafna   
> > >>
> > >>   
> > >   
>