Debugging
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/427
on engine.log I see:
At 2020-04-28 23:48:18,378-04 I see:SetVdsStatusVDSCommandParameters:{
hostId='b34db269-5351-4653-9a0c-90a9154cd687',
status='NonOperational',
nonOperationalReason='STORAGE_DOMAIN_UNREACHABLE',
stopSpmFailureLogged='false',
maintenanceReason='null'}
So, when test try to put host1 in local maintenance at 2020-04-28
23:59:51 it fails with:
Validation of action 'MaintenanceNumberOfVdss' failed for user
admin@internal-authz. Reasons:
VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_NO_ALTERNATE_HOST_FOR_HOSTED_ENGINE
vdsm on host0 shows a traceback
2020-04-28 23:43:04,944-0400 ERROR (jsonrpc/0) [vds] setKsmTune API
call failed. (API:1660)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1657, in setKsmTune
supervdsm.getProxy().ksmTune(tuningParams)
File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py",
line 56, in __call__
return callMethod()
File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py",
line 54, in <lambda>
**kwargs)
File "<string>", line 2, in ksmTune
File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773,
in _callmethod
raise convert_to_error(kind, result)
IOError: [Errno 22] Invalid argument
which seems unrelated but maybe worth to be investigated by storage
team. +Tal Nisan <tnisan(a)redhat.com> can you look into this?
More close to the failure on host0, I see:
2020-04-28 23:49:58,775-0400 ERROR (vm/b6ca2e94) [virt.vm]
(vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') The vm start process
failed (vm:934)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 868,
in _startUnderlyingVm
self._run()
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2895, in _run
dom.createWithFlags(flags)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
line 131, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py",
line 94, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in
createWithFlags
if ret == -1: raise libvirtError ('virDomainCreateWithFlags()
failed', dom=self)
libvirtError: internal error: qemu unexpectedly closed the monitor:
2020-04-29T03:49:55.484660Z qemu-kvm: warning: All CPU(s) up to
maxcpus should be described in NUMA config, ability to start up with
partial NUMA mappings is obsoleted and will be removed in future
2020-04-29T03:49:55.582536Z qemu-kvm: -device
virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,id=ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,bootindex=1,write-cache=on:
Failed to get shared "write" lock
Is another process using the image
[/var/run/vdsm/storage/fc1a55d5-deb4-4423-be56-e7313645798b/cfb2266f-5d47-4418-b30f-9c1d3fbf512c/68d04a61-9f34-4a1b-8d6e-bca43a7b9339]?
2020-04-28 23:49:58,775-0400 INFO (vm/b6ca2e94) [virt.vm]
(vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Changed state to Down:
internal error: qemu unexpectedly closed the monitor:
2020-04-29T03:49:55.484660Z qemu-kvm: warning: All CPU(s) up to
maxcpus should be described in NUMA config, ability to start up with
partial NUMA mappings is obsoleted and will be removed in future
2020-04-29T03:49:55.582536Z qemu-kvm: -device
virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,id=ua-cfb2266f-5d47-4418-b30f-9c1d3fbf512c,bootindex=1,write-cache=on:
Failed to get shared "write" lock
Is another process using the image
[/var/run/vdsm/storage/fc1a55d5-deb4-4423-be56-e7313645798b/cfb2266f-5d47-4418-b30f-9c1d3fbf512c/68d04a61-9f34-4a1b-8d6e-bca43a7b9339]?
(code=1) (vm:1702)
2020-04-28 23:49:58,799-0400 INFO (vm/b6ca2e94) [virt.vm]
(vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Stopping connection
(guestagent:455)
2020-04-28 23:49:58,849-0400 INFO (jsonrpc/1) [api.virt] START
destroy(gracefulAttempts=1) from=::ffff:192.168.200.99,49938,
vmId=b6ca2e94-df8b-48e9-b0ee-2bc0f939786a (api:48)
2020-04-28 23:49:58,851-0400 INFO (jsonrpc/1) [virt.vm]
(vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Release VM resources
(vm:5186)
2020-04-28 23:49:58,851-0400 WARN (jsonrpc/1) [virt.vm]
(vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') trying to set state to
Powering down when already Down (vm:626)
2020-04-28 23:49:58,851-0400 INFO (jsonrpc/1) [virt.vm]
(vmId='b6ca2e94-df8b-48e9-b0ee-2bc0f939786a') Stopping connection
(guestagent:455)
+Ryan Barry <rbarry(a)redhat.com> can you check the qemu-kvm warning?
Help understanding why storage domain became unreachable is welcome.
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <
https://www.redhat.com/>
sbonazzo(a)redhat.com
<
https://www.redhat.com/>*
<
https://www.redhat.com/en/summit?sc_cid=7013a000002D2QxAAK>*
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*