
Hi, I have issues with one host where supervdsm is failing in network_caps. I see the following trace in the log. MainProcess|jsonrpc/1::ERROR::2020-01-06 03:01:05,558::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper) Error in network_caps Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 98, in wrapper res = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 56, in network_caps return netswitch.configurator.netcaps(compatibility=30600) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 317, in netcaps net_caps = netinfo(compatibility=compatibility) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch/configurator.py", line 325, in netinfo _netinfo = netinfo_get(vdsmnets, compatibility) File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 150, in get return _stringify_mtus(_get(vdsmnets)) File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py", line 59, in _get ipaddrs = getIpAddrs() File "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/addresses.py", line 72, in getIpAddrs for addr in nl_addr.iter_addrs(): File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/addr.py", line 33, in iter_addrs with _nl_addr_cache(sock) as addr_cache: File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/__init__.py", line 92, in _cache_manager cache = cache_allocator(sock) File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/libnl.py", line 469, in rtnl_addr_alloc_cache raise IOError(-err, nl_geterror(err)) IOError: [Errno 16] Message sequence number mismatch A restart of supervdsm will resolve the issue for a period, maybe 24 hours, then it will occur again. So I'm thinking it's resource exhaustion or a leak of some kind? Running 4.2.8.2 with VDSM at 4.20.46. I've had a look through the bugzilla and can't find an exact match, closest was this one https://bugzilla.redhat.com/show_bug.cgi?id=1666123 which seems to be a RHV only fix. Thanks, Alan