Hi,
I sent this a while back and never got a response. We've since upgrade to 4.3 and the issue persists.
2021-03-24 10:53:48,934+0000 ERROR (periodic/2) [virt.periodic.Operation] <vdsm.virt.sampling.HostMonitor object at 0x7f5964398350> operation failed (periodic:188)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 186, in __call__
    self._func()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 481, in __call__
    stats = hostapi.get_stats(self._cif, self._samples.stats())
  File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 50, in get_stats
    decStats = stats.produce(first_sample, last_sample)
  File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 72, in produce
    stats.update(get_interfaces_stats())
  File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 154, in get_interfaces_stats
    return net_api.network_stats()
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 63, in network_stats
    return netstats.report()
  File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", line 32, in report
    stats = link_stats.report()
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", line 34, in report
    for iface_properties in iface.list():
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/iface.py", line 257, in list
    for properties in itertools.chain(link.iter_links(), dpdk_links):
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/link.py", line 47, in iter_links
    with _nl_link_cache(sock) as cache:
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/__init__.py", line 108, in _cache_manager
    cache = cache_allocator(sock)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/link.py", line 157, in _rtnl_link_alloc_cache
    return libnl.rtnl_link_alloc_cache(socket, AF_UNSPEC)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/libnl.py", line 578, in rtnl_link_alloc_cache
    raise IOError(-err, nl_geterror(err))
IOError: [Errno 16] Message sequence number mismatch
This occurs on both nodes in the cluster. A restart of vdsm/supervdsm will sort it for a while, but within 24 hours it occurs again. We run a number of clusters and it only occurs on one so must be some specific corner case we're triggering.
I can find almost no information on this. The best I could find was this 
https://linuxlizard.com/2020/10/18/message-sequence-number-mismatch-in-libnl/ which details a sequence number issue. I'm guessing I'm experiencing the same issue in that the nl sequence numbers are getting out of sync and closing/re-opening the nl socket (aka restart vdsm) is the only way to resolve.
I've completely hit a brick wall with it. We've had to disable fencing on both nodes as sometimes they get erroneously fenced when vdsm stops function correctly. At this point I'm thinking about replaced the severs with different models in-case it's something in the NIC drivers...
Alan