Debugging https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/427
on engine.log I see:
At 2020-04-28 23:48:18,378-04 I see:SetVdsStatusVDSCommandParameters:{
  hostId='b34db269-5351-4653-9a0c-90a9154cd687',
  status='NonOperational',
  nonOperationalReason='STORAGE_DOMAIN_UNREACHABLE',
  stopSpmFailureLogged='false',
  maintenanceReason='null'}

So, when test try to put host1 in local maintenance at 2020-04-28 23:59:51 it fails with:
Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_NO_ALTERNATE_HOST_FOR_HOSTED_ENGINE
vdsm on host0 shows a traceback
2020-04-28 23:43:04,944-0400 ERROR (jsonrpc/0) [vds] setKsmTune API call failed. (API:1660)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1657, in setKsmTune
    supervdsm.getProxy().ksmTune(tuningParams)
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in ksmTune
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
IOError: [Errno 22] Invalid argument
which seems unrelated but maybe worth to be investigated by storage team. +Tal Nisan can you look into this?


More close to the failure on host0, I see:
2020-04-28 23:49:58,775-0400 ERROR (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') The vm start process failed (vm:934)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 868, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2895, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-04-29T03:49:55.484660Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2020-04-29T03:49:55.582536Z qemu-kvm: -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,id=ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,bootindex=1,write-cache=on: Failed to get shared "write" lock
Is another process using the image [/var/run/vdsm/storage/fc1a55d5-deb4-4423-be56-e7313645798b/cfb2266f-5d47-4418-b30f-9c1d3fbf512c/68d04a61-9f34-4a1b-8d6e-bca43a7b9339]?
2020-04-28 23:49:58,775-0400 INFO  (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Changed state to Down: internal error: qemu unexpectedly closed the monitor: 2020-04-29T03:49:55.484660Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2020-04-29T03:49:55.582536Z qemu-kvm: -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,id=ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,bootindex=1,write-cache=on: Failed to get shared "write" lock
Is another process using the image [/var/run/vdsm/storage/fc1a55d5-deb4-4423-be56-e7313645798b/cfb2266f-5d47-4418-b30f-9c1d3fbf512c/68d04a61-9f34-4a1b-8d6e-bca43a7b9339]? (code=1) (vm:1702)
2020-04-28 23:49:58,799-0400 INFO  (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Stopping connection (guestagent:455)
2020-04-28 23:49:58,849-0400 INFO  (jsonrpc/1) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:192.168.200.99,49938, vmId=b6ca2e94-df8b-48e9-b0ee-2bc0f939786a (api:48)
2020-04-28 23:49:58,851-0400 INFO  (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Release VM resources (vm:5186)
2020-04-28 23:49:58,851-0400 WARN  (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') trying to set state to Powering down when already Down (vm:626)
2020-04-28 23:49:58,851-0400 INFO  (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Stopping connection (guestagent:455)
+Ryan Barry can you check the qemu-kvm warning?

Help understanding why storage domain became unreachable is welcome.
--

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA

sbonazzo@redhat.com   

Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.