Thanks Dominic.
Can you please review the network tests to fix this race between the
two tests?
Dafna, why do you think the current behavior should be changed, or
how the current behavior should be improved?
On Tue, Sep 11, 2018 at 9:53 PM, Dominik Holler
<dholler(a)redhat.com>
wrote:
> On Tue, 11 Sep 2018 17:18:15 +0200
> Dominik Holler <dholler(a)redhat.com> wrote:
>
> > On Tue, 11 Sep 2018 12:22:21 +0100
> > Dafna Ron <dron(a)redhat.com> wrote:
> >
> > > Can someone take ownership to fix it?
> > >
> >
> > I will take ownership and track the fix.
> >
> >
> > > On Tue, Sep 11, 2018 at 12:04 PM, Petr Horacek
> > > <phoracek(a)redhat.com> wrote:
> > >
> > > > vdsm.log is the same issue, supervdsm.log seems unrelated.
> > > >
> > > > 2018-09-11 11:59 GMT+02:00 Dominik Holler
> > > > <dholler(a)redhat.com>:
> > > >> On Tue, 11 Sep 2018 10:26:02 +0100
> > > >> Dafna Ron <dron(a)redhat.com> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > I have been seeing random failures of tests in different
> > > >> > projects caused by vdsm monitoring.
> > > >> >
> > > >> > I need someone from vdsm to please help debug this issue.
> > > >> >
> > > >>
> > > >> Petr, is this the same problem like yesterday in
> > > >>
> > > >> [ovirt-devel] failure in ost test - Invalid argument - help
> > > >> in debugging issue
> > > >>
> > > >> > From what I can see, the test suspend/resume vm failed
> > > >> > because we could not query the status of the vm on the
> > > >> > host.
> > > >> >
> > > >> > you can see full log from failed tests here:
> > > >> >
> > > >> >
https://jenkins.ovirt.org/job/ovirt-master_change-queue-test
> > > >> er/10208/artifact/basic-suite.el7.x86_64/test_logs/basic-
> > > >> suite-master/post-004_basic_sanity.py/
> > > >> >
> > > >> > Here are the errors that I can see in the vdsm which seem
> > > >> > to suggest that there is an issue getting stats on the vm
> > > >> > which was suspended and caused the failure of the test:
> > > >> >
> > > >> >
> > > >> >
> > > >> > 2018-09-11 00:40:05,896-0400 INFO (monitor/c1fe6e6)
> > > >> > [storage.StorageDomain] Removing remnants of deleted
> > > >> > images [] (fileSD:734) 2018-09-11 00:40:07,957-0400 DEBUG
> > > >> > (qgapoller/1) [vds] Not sending QEMU-GA command
> > > >> > 'guest-get-users' to
> > > >> > vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc',
command is
> > > >> > not supported (qemuguestagent:192) 2018-09-11
> > > >> > 00:40:08,068-0400 DEBUG (periodic/3)
> > > >> > [virt.sampling.VMBulkstatsMonitor] sampled timestamp
> > > >> > 4296118.49 elapsed 0.010 acquired True domains all
> > > >> > (sampling:443) 2018-09-11 00:40:08,271-0400 DEBUG
> > > >> > (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling
> > > >> > 'Image.prepare' in bridge with
{u'allowIllegal': True,
> > > >> > u'storagepoolID':
u'e80a56d9-74da-498a-b010-4a9df287f11d',
> > > >> > u'imageID': u'd4c831e6-02d2-4d89-b516-0ec4597
5e024',
> > > >> > u'volumeID':
u'15b07af1-625b-42e3-b62a-8e7c7a120a56',
> > > >> > u'storagedomainID':
> > > >> > u'f1744940-41b6-4d35-b7bf-870c4e07d995'}
(__init__:329)
> > > >> >
> > > >> >
> > > >> > 2018-09-11 00:40:10,846-0400 DEBUG (vmchannels) [virt.vm]
> > > >> > (vmId='8214433a-f233-4aaa-aeda-2ce1d31c78dc') Guest
> > > >> > connection timed out (guestagent:556)
> > > >> > 2018-09-11 00:40:11,637-0400 DEBUG (jsonrpc/5)
> > > >> > [jsonrpc.JsonRpcServer] Calling 'Host.getStats' in
bridge
> > > >> > with {} (__init__:329) 2018-09-11 00:40:11,637-0400 INFO
> > > >> > (jsonrpc/5) [api.host] START getStats()
> > > >> > from=::ffff:192.168.201.4,49184 (api:47) 2018-09-11
> > > >> > 00:40:11,643-0400 DEBUG (jsonrpc/5) [root] cannot read eth0
> > > >> > speed (nic:42) 2018-09-11 00:40:11,645-0400 DEBUG
> > > >> > (jsonrpc/5) [root] cannot read eth1 speed (nic:42)
> > > >> > 2018-09-11 00:40:11,647-0400 DEBUG (jsonrpc/5) [root]
> > > >> > cannot read eth2 speed (nic:42)
> > > >> > 2018-09-11 00:40:11,649-0400 DEBUG (jsonrpc/5) [root]
> > > >> > cannot read eth3 speed (nic:42)
> > > >> > 2018-09-11 00:40:11,667-0400 INFO (jsonrpc/5) [api.host]
> > > >> > FINISH getStats error=[Errno 22] Invalid argument
> > > >> > from=::ffff:192.168.201.4,49184 (api:51) 2018-09-11
> > > >> > 00:40:11,667-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer]
> > > >> > Internal server error (__init__:350) Traceback (most recent
> > > >> > call last): File
> > > >> >
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py",
> > > >> > line 345, in _handle_request res = method(**params)
> > > >> > File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py",
> > > >> > line 202, in _dynamicMethod result = fn(*methodArgs)
> > > >> > File "<string>", line 2, in getStats
> > > >> > File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/common/api.py",
> > > >> > line 49, in method ret = func(*args, **kwargs)
> > > >> > File
"/usr/lib/python2.7/site-packages/vdsm/API.py", line
> > > >> > 1407, in getStats
> > > >> > multipath=True)}
> > > >> > File
"/usr/lib/python2.7/site-packages/vdsm/host/api.py",
> > > >> > line 49, in get_stats
> > > >> > decStats = stats.produce(first_sample, last_sample)
> > > >> > File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/host/stats.py",
> > > >> > line 71, in produce stats.update(get_interfaces_stats())
> > > >> > File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/host/stats.py",
> > > >> > line 153, in get_interfaces_stats return
> > > >> > net_api.network_stats() File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/network/api.py",
> > > >> > line 63, in network_stats return netstats.report()
> > > >> > File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/network/netstats.py",
> > > >> > line 31, in report stats = link_stats.report()
> > > >> > File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py",
> > > >> > line 41, in report speed = vlan.speed(i.device)
> > > >> > File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py",
> > > >> > line 36, in speed dev_speed =
> > > >> > nic.read_speed_using_sysfs(dev_name) File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py",
> > > >> > line 48, in read_speed_using_sysfs s = int(f.read())
> > > >> > IOError: [Errno 22] Invalid argument
> > > >> > 2018-09-11 00:40:11,669-0400 INFO (jsonrpc/5)
> > > >> > [jsonrpc.JsonRpcServer] RPC call Host.getStats failed
> > > >> > (error -32603) in 0.03 seconds (__init__:312) 2018-09-11
> > > >> > 00:40:14,656-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
> > > >> > Calling 'Host.getAllVmStats' in bridge with {}
> > > >> > (__init__:329) 2018-09-11 00:40:14,657-0400 INFO
> > > >> > (jsonrpc/2) [api.host] START getAllVmStats()
> > > >> > from=::1,46772 (api:47) 2018-09-11 00:40:14,659-0400 INFO
> > > >> > (jsonrpc/2) [api.host] FINISH getAllVmStats
> > > >> > return={'status': {'message':
'Done', 'code': 0},
> > > >> > 'statsList': (suppressed)} from=::1,46772 (api:53)
> > > >> > 2018-09-11 00:40:14,660-0400 DEBUG (jsonrpc/2)
> > > >> > [jsonrpc.JsonRpcServer] Return 'Host.getAllVmStats'
in
> > > >> > bridge with (suppressed) (__init__:356) 2018-09-11
> > > >> > 00:40:14,661-0400 INFO (jsonrpc/2)
> > > >> > [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats
> > > >> > succeeded in 0.00 seconds (__init__:312) 2018-09-11
> > > >> > 00:40:14,674-0400 DEBUG (jsonrpc/1)
> > > >> > [jsonrpc.JsonRpcServer] Calling
> > > >> > 'Host.getAllVmIoTunePolicies' in bridge with {}
> > > >> > (__init__:329) 2018-09-11 00:40:14,675-0400 INFO
> > > >> > (jsonrpc/1) [api.host] START getAllVmIoTunePolicies()
> > > >> > from=::1,46772 (api:47) 2018-09-11 00:40:14,675-0400 INFO
> > > >> > (jsonrpc/1) [api.host] FINISH getAllVmIoTunePolicies
> > > >> > return={'status': {'message':
'Done', 'code': 0},
> > > >> > 'io_tune_policies_dict':
> > > >> > {'8214433a-f233-4aaa-aeda-2ce1d31c78dc':
{'policy': [],
> > > >> > 'current_values': [{'ioTune':
{'write_bytes_sec': 0L,
> > > >> > 'total_iops_sec': 0L, 'read_iops_sec': 0L,
> > > >> > 'read_bytes_sec': 0L, 'write_iops_sec': 0L,
> > > >> > 'total_bytes_sec': 0L}, 'path':
> > > >> > '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-
> > > >> 870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/
> > > >> e2570635-a798-4e9e-a0ea-88436d5b296f',
> > > >> > 'name': 'vda'}]}}} from=::1,46772 (api:53)
2018-09-11
> > > >> > 00:40:14,675-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
> > > >> > Return 'Host.getAllVmIoTunePolicies' in bridge with
> > > >> > {'8214433a-f233-4aaa-aeda-2ce1d31c78dc':
{'policy': [],
> > > >> > 'current_values': [{'ioTune':
{'write_bytes_sec': 0L,
> > > >> > 'total_iops_sec': 0L, 'read_iops_sec': 0L,
> > > >> > 'read_bytes_sec': 0L, 'write_iops_sec': 0L,
> > > >> > 'total_bytes_sec': 0L}, 'path':
> > > >> > '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-
> > > >> 870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/
> > > >> e2570635-a798-4e9e-a0ea-88436d5b296f',
> > > >> > 'name': 'vda'}]}} (__init__:356) 2018-09-11
> > > >> > 00:40:14,675-0400 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer]
> > > >> > RPC call Host.getAllVmIoTunePolicies succeeded in 0.00
> > > >> > seconds (__init__:312) 2018-09-11 00:40:16,118-0400 DEBUG
> > > >> > (jsonrpc/7) [jsonrpc.JsonRpcServer] Calling
> > > >> > 'Host.getAllVmStats' in bridge with {}
(__init__:329)
> > > >> > 2018-09-11 00:40:16,119-0400 INFO (jsonrpc/7) [api.host]
> > > >> > START getAllVmStats() from=::ffff:192.168.201.4,49184
> > > >> > (api:47) 2018-09-11 00:40:16,121-0400 INFO (jsonrpc/7)
> > > >> > [api.host] FINISH getAllVmStats return={'status':
> > > >> > {'message': 'Done', 'code': 0},
'statsList': (suppressed)}
> > > >> > from=::ffff:192.168.201.4,49184 (api:53) 2018-09-11
> > > >> > 00:40:16,122-0400 DEBUG (jsonrpc/7)
> > > >> > [jsonrpc.JsonRpcServer] Return 'Host.getAllVmStats'
in
> > > >> > bridge with (suppressed) (__init__:356) 2018-09-11
> > > >> > 00:40:16,122-0400 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer]
> > > >> > RPC call Host.getAllVmStats succeeded in 0.01 seconds
> > > >> > (__init__:312) 2018-09-11 00:40:17,956-0400 DEBUG
> > > >> > (qgapoller/2) [vds] Not sending QEMU-GA command
> > > >> > 'guest-get-users' to
> > > >> > vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc',
command is
> > > >> > not supported (qemuguestagent:192) 2018-09-11
> > > >> > 00:40:18,025-0400 DEBUG (periodic/3) [virt.periodic]
> > > >> > Looking for stale paused VMs (periodic:388)
> > > >> >
> > > >> > I can also see this in Super vdsm which may be related but
> > > >> > I an not 100% sure its related to current issue:
> > > >> >
> > > >> > MainProcess|jsonrpc/2::DEBUG::2018-09-11
> > > >> > 00:33:20,012::cmdutils::159::root::(exec_cmd) SUCCESS:
> > > >> > <err> = ''; <rc> = 0
> > > >> > ifup/on039d2d88d8ee4::DEBUG::2018-09-11
> > > >> > 00:33:20,238::cmdutils::159::root::(exec_cmd) FAILED:
> > > >> > <err> = 'Running scope as unit
> > > >> > 2f1f417d-85b9-46c0-b7d7-6082836daaae.scope.\n/etc/sysconfig/
> > > >> network-scripts/ifup-eth:
> > > >> > line 321: 20715 Terminated
> > > >> > /sbin/dhclient ${DHCLIENTARGS}
> > > >> > ${DEVICE}\nDevice "on039d2d88d8ee4" does not
> > > >> > exist.\nCannot find device
"on039d2d88d8ee4"\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d
> > > >> > 8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
> > > >> > exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevic e
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d
> > > >> > 8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
> > > >> > exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevic e
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
"on039d2d88d
> > > >> > 8ee4" does not exist.\nDevice
"on039d2d88d8ee4" does not
> > > >> > exist.\nDevice "on039d2d88d8ee4" does not
exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevic e
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\nDevice
> > > >> > "on039d2d88d8ee4" does not exist.\n';
<rc> = 1
> > > >> > ifup/on039d2d88d8ee4::ERROR::2018-09-11
> > > >> > 00:33:20,238::concurrent::202::root::(run) FINISH thread
> > > >> > <Thread(ifup/on039d2d88d8ee4, stopped daemon
> > > >> > 140281550731008)> failed Traceback (most recent call
> > > >> > last): File
> > > >> >
"/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py",
> > > >> > line 195, in run ret = func(*args, **kwargs) File
> > > >> > "/usr/lib/python2.7/site-packages/vdsm/network/
> configurators/ifcfg.py",
> > > >> > line 948, in _exec_ifup _exec_ifup_by_name(iface.name,
> > > >> > cgroup) File
> > > >> > "/usr/lib/python2.7/site-packages/vdsm/network/
> configurators/ifcfg.py",
> > > >> > line 934, in _exec_ifup_by_name
> > > >> > raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if
> > > >> > out else '') ConfigNetworkError: (29, '\n')
> > > >> >
> > > >> >
>
>
> What happens in
> basic-suite-master/test-scenarios/003_basic_networking.py :
> modify_host_0_ip_to_dhcp set VM_NETWORK to use DHCP,
> so vdsm is calling ifup, which tries to configure via DHCP in the
> background.
> The next test detach_vm_network_from_host_0 removes the network,
> while ifup still hangs in dhcp configuration, because in this VLAN
> there is no DHCP server.
> So from my point of view this error in supervdsm.log is legitimate,
> because VDSM was advised to configure via DHCP, which fails because
> the interface is removed during dhcp configuration.
> I did not recognize that this triggeres any chained errors.
>
> > > >> > thanks,
> > > >> > Dafna
> > > >>
> > > >>
> > > >
> >
>
>