Quickly looking in the engine log does not reveal anything useful.On Sun, Dec 18, 2016 at 4:17 PM, Barak Korren <bkorren@redhat.com> wrote:
> After we've fixed the various system test issues that arose from the
> CentOS 7.3 release, we're now seeing a new failure that seems to has
> to do with snapshot merges.
>
> I'm guessing this may have to do with something the went in last week
> while we "weren't looking".
>
> Failing job can be seen here:
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/4275
>
> The test code snippet that is failing is as follows:
>
> 226 api.vms.get(VM0_NAME).snapshots.list()[-2].delete()
> 227 testlib.assert_true_within_short(
> 228 lambda:
> 229 (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and
> 230 (api.vms.get(VM0_NAME).snapshots.list()[-1].snapshot_ status
> 231 == 'ok'),
> 232 )
>
> The failure itself is a test timeout:
>
> ...
> File "/home/jenkins/workspace/test-repo_ovirt_experimental_ master/ovirt-system-tests/ basic-suite-master/test- scenarios/004_basic_sanity.py" ,
> line 228, in snapshot_merge
> lambda:
> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py" , line
> 248, in assert_true_within_short
> allowed_exceptions=allowed_exceptions,
> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py" , line
> 240, in assert_true_within
> raise AssertionError('Timed out after %s seconds' % timeout)
> AssertionError: Timed out after 180 seconds
>
> Engine log generated during the test can be found here:
>
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/4275/artifact/exported- artifacts/basic_suite_master. sh-el7/exported-artifacts/ test_logs/basic-suite-master/ post-004_basic_sanity.py/lago- basic-suite-master-engine/_ var_log_ovirt-engine/engine. log
Understanding issues like this require vdsm log from the host
performing the operation.
Looking in vdsm log:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ master/4275/artifact/exported- artifacts/basic_suite_master. sh-el7/exported-artifacts/ test_logs/basic-suite-master/ post-004_basic_sanity.py/lago- basic-suite-master-host1/_var_ log_vdsm/vdsm.log
2016-12-18 07:54:44,325 INFO (jsonrpc/1) [dispatcher] Run and
protect: getDeviceList(storageType=3, guids=(), checkStatus=False,
options={}) (logUtils:49)
2016-12-18 07:54:44,565 INFO (jsonrpc/1) [dispatcher] Run and
protect: getDeviceList, Return response: {'devList': [{'status':
'unknown', 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev':
'4.0', 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '',
'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
'1', 'user': 'username', 'password': '********', 'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdd', 'capacity': '21474836480',
'state': 'active', 'lun': '3'}], 'devtype': 'iSCSI',
'physicalblocksize': '512', 'pvUUID': '', 'serial':
'SLIO-ORG_lun3_bdev_b94d3d26-d6e4-4bfc-9d91-4898731b721f', 'GUID':
'36001405b94d3d26d6e44bfc9d9148 987', 'productID': 'lun3_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
'1', 'user': 'username', 'password': '********', 'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdc', 'capacity': '21474836480',
'state': 'active', 'lun': '2'}], 'devtype': 'iSCSI',
'physicalblocksize': '512', 'pvUUID': '', 'serial':
'SLIO-ORG_lun2_bdev_eddd3ce8-2ea8-4326-a9a8-ebc609a768db', 'GUID':
'36001405eddd3ce82ea84326a9a8eb c60', 'productID': 'lun2_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
'1', 'user': 'username', 'password': '********', 'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sda', 'capacity': '21474836480',
'state': 'active', 'lun': '0'}], 'devtype': 'iSCSI',
'physicalblocksize': '512', 'pvUUID': '', 'serial':
'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID':
'360014052e4947abdee947b5b55c9e 28a', 'productID': 'lun0_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
'1', 'user': 'username', 'password': '********', 'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdb', 'capacity': '21474836480',
'state': 'active', 'lun': '1'}], 'devtype': 'iSCSI',
'physicalblocksize': '512', 'pvUUID': '', 'serial':
'SLIO-ORG_lun1_bdev_c581fee8-4da8-42c8-8414-305e003cd57c', 'GUID':
'36001405c581fee84da842c8841430 5e0', 'productID': 'lun1_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
'1', 'user': 'username', 'password': '********', 'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sde', 'capacity': '21474836480',
'state': 'active', 'lun': '4'}], 'devtype': 'iSCSI',
'physicalblocksize': '512', 'pvUUID': '', 'serial':
'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID':
'3600140518c78046b1594d14a65e5f 9ad', 'productID': 'lun4_bdev'}]}
(logUtils:52)
2016-12-18 07:54:44,567 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
call Host.getDeviceList succeeded in 0.24 seconds (__init__:515)
2016-12-18 07:54:45,692 INFO (jsonrpc/2) [dispatcher] Run and
protect: getDeviceList(storageType=3,
guids=[u'3600140518c78046b1594d14a65e5f 9ad',
u'360014052e4947abdee947b5b55c9e 28a'], checkStatus=False, options={})
(logUtils:49)
2016-12-18 07:54:45,882 WARN (jsonrpc/2) [storage.LVM] lvm pvs
failed: 5 [] [' WARNING: Not using lvmetad because config setting
use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to
make changes visible (pvscan --cache).', ' Failed to find physical
volume "/dev/mapper/3600140518c78046b1594d14a65e5f 9ad".'] (lvm:324)
2016-12-18 07:54:45,883 WARN (jsonrpc/2) [storage.HSM] getPV failed
for guid: 3600140518c78046b1594d14a65e5f9ad (hsm:1970)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s):
"devices=(u'3600140518c78046b1594d14a65e5f 9ad',)"
2016-12-18 07:54:45,944 WARN (jsonrpc/2) [storage.LVM] lvm pvs
failed: 5 [] [' WARNING: Not using lvmetad because config setting
use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to
make changes visible (pvscan --cache).', ' Failed to find physical
volume "/dev/mapper/360014052e4947abdee947b5b55c9e 28a".'] (lvm:324)
2016-12-18 07:54:45,944 WARN (jsonrpc/2) [storage.HSM] getPV failed
for guid: 360014052e4947abdee947b5b55c9e28a (hsm:1970)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s):
"devices=(u'360014052e4947abdee947b5b55c9e 28a',)"
2016-12-18 07:54:45,977 INFO (jsonrpc/2) [dispatcher] Run and
protect: getDeviceList, Return response: {'devList': [{'status':
'unknown', 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev':
'4.0', 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '',
'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
'1', 'user': 'username', 'password': '********', 'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sda', 'capacity': '21474836480',
'state': 'active', 'lun': '0'}], 'devtype': 'iSCSI',
'physicalblocksize': '512', 'pvUUID': '', 'serial':
'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID':
'360014052e4947abdee947b5b55c9e 28a', 'productID': 'lun0_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
'1', 'user': 'username', 'password': '********', 'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sde', 'capacity': '21474836480',
'state': 'active', 'lun': '4'}], 'devtype': 'iSCSI',
'physicalblocksize': '512', 'pvUUID': '', 'serial':
'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID':
'3600140518c78046b1594d14a65e5f 9ad', 'productID': 'lun4_bdev'}]}
(logUtils:52)
2016-12-18 07:54:45,978 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
call Host.getDeviceList succeeded in 0.29 seconds (__init__:515)
Looks a pv is not accessible - storage issue?
Please check that the pvs are clean before using them for the test.
Clearing the luns on the the storage before the test can be a good idea.
2016-12-18 07:54:47,165 INFO (jsonrpc/4) [dispatcher] Run and
protect: createVG(vgname=u'097c3032-99da-4e18-b547-19ff340f846e',
devlist=[u'3600140518c78046b1594d14a65e5f 9ad',
u'360014052e4947abdee947b5b55c9e 28a'], force=False, options=None)
(logUtils:49)
Engine is trying to create a vg with the pv
360014052e4947abdee947b5b55c9e28a that
fail before in getDeviceList.
This looks like a bug in engine, ignoring the results of getDeviceList.
2016-12-18 07:54:47,552 INFO (jsonrpc/4) [dispatcher] Run and
protect: createVG, Return response: {'uuid':
'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW'} (logUtils:52)
2016-12-18 07:54:47,552 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC
call LVMVolumeGroup.create succeeded in 0.38 seconds (__init__:515)
2016-12-18 07:54:48,562 INFO (jsonrpc/5) [dispatcher] Run and
protect: createStorageDomain(storageType=3,
sdUUID=u'097c3032-99da-4e18-b547-19ff340f846e', domainName=u'iscsi',
typeSpecificArg=u'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW', domClass=1,
domVersion=u'3', options=None) (logUtils:49)
2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache]
looking for unfetched domain 097c3032-99da-4e18-b547-19ff340f846e
(sdc:151)
2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache]
looking for domain 097c3032-99da-4e18-b547-19ff340f846e (sdc:168)
2016-12-18 07:54:48,564 ERROR (jsonrpc/5) [storage.StorageDomainCache]
domain 097c3032-99da-4e18-b547-19ff340f846e not found (sdc:157)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'097c3032-99da-4e18-b547-19ff340f846e',)
We a lot of these errors in the rest of the log. This meas something
is wrong with this vg.
Needs deeper investigation from storage developer on both engine and vdsm side,
but I would start by making sure we use clean luns. We are not trying
to test esoteric
negative flows in the system tests.
Did we change something in the system tests project or lago while we
were not looking?
Can we reproduce this issue manually with same engine and vdsm versions?
Nir
>
> Please have a look.
>
> Thanks,
> Barak.
>
>
> --
> Barak Korren
> bkorren@redhat.com
> RHCE, RHCi, RHV-DevOps Team
> https://ifireball.wordpress.com/
> _______________________________________________
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra