ovirt-system-tests_he-basic-suite-4.3 fails on storage domain unreachable

Debugging https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/427 on engine.log I see: At 2020-04-28 23:48:18,378-04 I see:SetVdsStatusVDSCommandParameters:{ hostId='b34db269-5351-4653-9a0c-90a9154cd687', status='NonOperational', nonOperationalReason='STORAGE_DOMAIN_UNREACHABLE', stopSpmFailureLogged='false', maintenanceReason='null'} So, when test try to put host1 in local maintenance at 2020-04-28 23:59:51 it fails with: Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_NO_ALTERNATE_HOST_FOR_HOSTED_ENGINE vdsm on host0 shows a traceback 2020-04-28 23:43:04,944-0400 ERROR (jsonrpc/0) [vds] setKsmTune API call failed. (API:1660) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1657, in setKsmTune supervdsm.getProxy().ksmTune(tuningParams) File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in ksmTune File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) IOError: [Errno 22] Invalid argument which seems unrelated but maybe worth to be investigated by storage team. +Tal Nisan <tnisan@redhat.com> can you look into this? More close to the failure on host0, I see: 2020-04-28 23:49:58,775-0400 ERROR (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') The vm start process failed (vm:934) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 868, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2895, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-04-29T03:49:55.484660Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future 2020-04-29T03:49:55.582536Z qemu-kvm: -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,id=ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,bootindex=1,write-cache=on: Failed to get shared "write" lock Is another process using the image [/var/run/vdsm/storage/fc1a55d5-deb4-4423-be56-e7313645798b/cfb2266f-5d47-4418-b30f-9c1d3fbf512c/68d04a61-9f34-4a1b-8d6e-bca43a7b9339]? 2020-04-28 23:49:58,775-0400 INFO (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Changed state to Down: internal error: qemu unexpectedly closed the monitor: 2020-04-29T03:49:55.484660Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future 2020-04-29T03:49:55.582536Z qemu-kvm: -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,id=ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,bootindex=1,write-cache=on: Failed to get shared "write" lock Is another process using the image [/var/run/vdsm/storage/fc1a55d5-deb4-4423-be56-e7313645798b/cfb2266f-5d47-4418-b30f-9c1d3fbf512c/68d04a61-9f34-4a1b-8d6e-bca43a7b9339]? (code=1) (vm:1702) 2020-04-28 23:49:58,799-0400 INFO (vm/b6ca2e94) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Stopping connection (guestagent:455) 2020-04-28 23:49:58,849-0400 INFO (jsonrpc/1) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:192.168.200.99,49938, vmId=b6ca2e94-df8b-48e9-b0ee-2bc0f939786a (api:48) 2020-04-28 23:49:58,851-0400 INFO (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Release VM resources (vm:5186) 2020-04-28 23:49:58,851-0400 WARN (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') trying to set state to Powering down when already Down (vm:626) 2020-04-28 23:49:58,851-0400 INFO (jsonrpc/1) [virt.vm] (vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Stopping connection (guestagent:455) +Ryan Barry <rbarry@redhat.com> can you check the qemu-kvm warning? Help understanding why storage domain became unreachable is welcome. -- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/>* <https://www.redhat.com/en/summit?sc_cid=7013a000002D2QxAAK>* *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours.*
participants (1)
-
Sandro Bonazzola