On Sun, Dec 18, 2016 at 4:17 PM, Barak Korren
<bkorren(a)redhat.com> wrote:
> After we've fixed the various system test issues that arose from the
> CentOS 7.3 release, we're now seeing a new failure that seems to has
> to do with snapshot merges.
>
> I'm guessing this may have to do with something the went in last week
> while we "weren't looking".
>
> Failing job can be seen here:
>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4275
>
> The test code snippet that is failing is as follows:
>
> 226 api.vms.get(VM0_NAME).snapshots.list()[-2].delete()
> 227 testlib.assert_true_within_short(
> 228 lambda:
> 229 (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and
> 230 (api.vms.get(VM0_NAME).snapshots.list()[-1].snapshot_
status
> 231 == 'ok'),
> 232 )
>
> The failure itself is a test timeout:
>
> ...
> File "/home/jenkins/workspace/test-repo_ovirt_experimental_
master/ovirt-system-tests/basic-suite-master/test-
scenarios/004_basic_sanity.py",
> line 228, in snapshot_merge
> lambda:
> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
> 248, in assert_true_within_short
> allowed_exceptions=allowed_exceptions,
> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
> 240, in assert_true_within
> raise AssertionError('Timed out after %s seconds' % timeout)
> AssertionError: Timed out after 180 seconds
>
> Engine log generated during the test can be found here:
>
>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/4275/artifact/exported-artifacts/basic_suite_master.
sh-el7/exported-artifacts/test_logs/basic-suite-master/
post-004_basic_sanity.py/lago-basic-suite-master-engine/_
var_log_ovirt-engine/engine.log
Quickly looking in the engine log does not reveal anything useful.
Understanding issues like this require vdsm log from the host
performing the operation.
Looking in vdsm log:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/4275/artifact/exported-artifacts/basic_suite_master.
sh-el7/exported-artifacts/test_logs/basic-suite-master/
post-004_basic_sanity.py/lago-basic-suite-master-host1/_var_
log_vdsm/vdsm.log
2016-12-18 07:54:44,325 INFO (jsonrpc/1) [dispatcher] Run and
protect: getDeviceList(storageType=3, guids=(), checkStatus=False,
options={}) (logUtils:49)
2016-12-18 07:54:44,565 INFO (jsonrpc/1) [dispatcher] Run and
protect: getDeviceList, Return response: {'devList': [{'status':
'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev':
'4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '',
'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdd', 'capacity':
'21474836480',
'state': 'active', 'lun': '3'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun3_bdev_b94d3d26-d6e4-4bfc-9d91-4898731b721f', 'GUID':
'36001405b94d3d26d6e44bfc9d9148987', 'productID': 'lun3_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdc', 'capacity':
'21474836480',
'state': 'active', 'lun': '2'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun2_bdev_eddd3ce8-2ea8-4326-a9a8-ebc609a768db', 'GUID':
'36001405eddd3ce82ea84326a9a8ebc60', 'productID': 'lun2_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sda', 'capacity':
'21474836480',
'state': 'active', 'lun': '0'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID':
'360014052e4947abdee947b5b55c9e28a', 'productID': 'lun0_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdb', 'capacity':
'21474836480',
'state': 'active', 'lun': '1'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun1_bdev_c581fee8-4da8-42c8-8414-305e003cd57c', 'GUID':
'36001405c581fee84da842c88414305e0', 'productID': 'lun1_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sde', 'capacity':
'21474836480',
'state': 'active', 'lun': '4'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID':
'3600140518c78046b1594d14a65e5f9ad', 'productID': 'lun4_bdev'}]}
(logUtils:52)
2016-12-18 07:54:44,567 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
call Host.getDeviceList succeeded in 0.24 seconds (__init__:515)
2016-12-18 07:54:45,692 INFO (jsonrpc/2) [dispatcher] Run and
protect: getDeviceList(storageType=3,
guids=[u'3600140518c78046b1594d14a65e5f9ad',
u'360014052e4947abdee947b5b55c9e28a'], checkStatus=False, options={})
(logUtils:49)
2016-12-18 07:54:45,882 WARN (jsonrpc/2) [storage.LVM] lvm pvs
failed: 5 [] [' WARNING: Not using lvmetad because config setting
use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to
make changes visible (pvscan --cache).', ' Failed to find physical
volume "/dev/mapper/3600140518c78046b1594d14a65e5f9ad".'] (lvm:324)
2016-12-18 07:54:45,883 WARN (jsonrpc/2) [storage.HSM] getPV failed
for guid: 3600140518c78046b1594d14a65e5f9ad (hsm:1970)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s):
"devices=(u'3600140518c78046b1594d14a65e5f9ad',)"
2016-12-18 07:54:45,944 WARN (jsonrpc/2) [storage.LVM] lvm pvs
failed: 5 [] [' WARNING: Not using lvmetad because config setting
use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to
make changes visible (pvscan --cache).', ' Failed to find physical
volume "/dev/mapper/360014052e4947abdee947b5b55c9e28a".'] (lvm:324)
2016-12-18 07:54:45,944 WARN (jsonrpc/2) [storage.HSM] getPV failed
for guid: 360014052e4947abdee947b5b55c9e28a (hsm:1970)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s):
"devices=(u'360014052e4947abdee947b5b55c9e28a',)"
2016-12-18 07:54:45,977 INFO (jsonrpc/2) [dispatcher] Run and
protect: getDeviceList, Return response: {'devList': [{'status':
'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev':
'4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '',
'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sda', 'capacity':
'21474836480',
'state': 'active', 'lun': '0'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID':
'360014052e4947abdee947b5b55c9e28a', 'productID': 'lun0_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sde', 'capacity':
'21474836480',
'state': 'active', 'lun': '4'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID':
'3600140518c78046b1594d14a65e5f9ad', 'productID': 'lun4_bdev'}]}
(logUtils:52)
2016-12-18 07:54:45,978 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
call Host.getDeviceList succeeded in 0.29 seconds (__init__:515)
Looks a pv is not accessible - storage issue?
Please check that the pvs are clean before using them for the test.
Clearing the luns on the the storage before the test can be a good idea.
They should ALWAYS be clean, as we start from scratch (including creation
of the devices on the storage) from scratch in every run.
2016-12-18 07:54:47,165 INFO (jsonrpc/4) [dispatcher] Run and
protect: createVG(vgname=u'097c3032-99da-4e18-b547-19ff340f846e',
devlist=[u'3600140518c78046b1594d14a65e5f9ad',
u'360014052e4947abdee947b5b55c9e28a'], force=False, options=None)
(logUtils:49)
Engine is trying to create a vg with the pv
360014052e4947abdee947b5b55c9e28a that
fail before in getDeviceList.
This looks like a bug in engine, ignoring the results of getDeviceList.
2016-12-18 07:54:47,552 INFO (jsonrpc/4) [dispatcher] Run and
protect: createVG, Return response: {'uuid':
'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW'} (logUtils:52)
2016-12-18 07:54:47,552 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC
call LVMVolumeGroup.create succeeded in 0.38 seconds (__init__:515)
2016-12-18 07:54:48,562 INFO (jsonrpc/5) [dispatcher] Run and
protect: createStorageDomain(storageType=3,
sdUUID=u'097c3032-99da-4e18-b547-19ff340f846e', domainName=u'iscsi',
typeSpecificArg=u'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW', domClass=1,
domVersion=u'3', options=None) (logUtils:49)
2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache]
looking for unfetched domain 097c3032-99da-4e18-b547-19ff340f846e
(sdc:151)
2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache]
looking for domain 097c3032-99da-4e18-b547-19ff340f846e (sdc:168)
2016-12-18 07:54:48,564 ERROR (jsonrpc/5) [storage.StorageDomainCache]
domain 097c3032-99da-4e18-b547-19ff340f846e not found (sdc:157)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'097c3032-99da-4e18-b547-19ff340f846e',)
We a lot of these errors in the rest of the log. This meas something
is wrong with this vg.
Needs deeper investigation from storage developer on both engine and vdsm
side,
but I would start by making sure we use clean luns. We are not trying
to test esoteric
negative flows in the system tests.
Did we change something in the system tests project or lago while we
were not looking?
CentOS 7.3... ?
Y.