On Sun, Dec 18, 2016 at 4:17 PM, Barak Korren <bkorren(a)redhat.com> wrote:
After we've fixed the various system test issues that arose from
the
CentOS 7.3 release, we're now seeing a new failure that seems to has
to do with snapshot merges.
I'm guessing this may have to do with something the went in last week
while we "weren't looking".
Failing job can be seen here:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4275
The test code snippet that is failing is as follows:
226 api.vms.get(VM0_NAME).snapshots.list()[-2].delete()
227 testlib.assert_true_within_short(
228 lambda:
229 (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and
230 (api.vms.get(VM0_NAME).snapshots.list()[-1].snapshot_status
231 == 'ok'),
232 )
The failure itself is a test timeout:
...
File
"/home/jenkins/workspace/test-repo_ovirt_experimental_master/ovirt-system-tests/basic-suite-master/test-scenarios/004_basic_sanity.py",
line 228, in snapshot_merge
lambda:
File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
248, in assert_true_within_short
allowed_exceptions=allowed_exceptions,
File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
240, in assert_true_within
raise AssertionError('Timed out after %s seconds' % timeout)
AssertionError: Timed out after 180 seconds
Engine log generated during the test can be found here:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4275/art...
Quickly looking in the engine log does not reveal anything useful.
Understanding issues like this require vdsm log from the host
performing the operation.
Looking in vdsm log:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4275/art...
2016-12-18 07:54:44,325 INFO (jsonrpc/1) [dispatcher] Run and
protect: getDeviceList(storageType=3, guids=(), checkStatus=False,
options={}) (logUtils:49)
2016-12-18 07:54:44,565 INFO (jsonrpc/1) [dispatcher] Run and
protect: getDeviceList, Return response: {'devList': [{'status':
'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev':
'4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '',
'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdd', 'capacity':
'21474836480',
'state': 'active', 'lun': '3'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun3_bdev_b94d3d26-d6e4-4bfc-9d91-4898731b721f', 'GUID':
'36001405b94d3d26d6e44bfc9d9148987', 'productID': 'lun3_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdc', 'capacity':
'21474836480',
'state': 'active', 'lun': '2'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun2_bdev_eddd3ce8-2ea8-4326-a9a8-ebc609a768db', 'GUID':
'36001405eddd3ce82ea84326a9a8ebc60', 'productID': 'lun2_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sda', 'capacity':
'21474836480',
'state': 'active', 'lun': '0'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID':
'360014052e4947abdee947b5b55c9e28a', 'productID': 'lun0_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sdb', 'capacity':
'21474836480',
'state': 'active', 'lun': '1'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun1_bdev_c581fee8-4da8-42c8-8414-305e003cd57c', 'GUID':
'36001405c581fee84da842c88414305e0', 'productID': 'lun1_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sde', 'capacity':
'21474836480',
'state': 'active', 'lun': '4'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID':
'3600140518c78046b1594d14a65e5f9ad', 'productID': 'lun4_bdev'}]}
(logUtils:52)
2016-12-18 07:54:44,567 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
call Host.getDeviceList succeeded in 0.24 seconds (__init__:515)
2016-12-18 07:54:45,692 INFO (jsonrpc/2) [dispatcher] Run and
protect: getDeviceList(storageType=3,
guids=[u'3600140518c78046b1594d14a65e5f9ad',
u'360014052e4947abdee947b5b55c9e28a'], checkStatus=False, options={})
(logUtils:49)
2016-12-18 07:54:45,882 WARN (jsonrpc/2) [storage.LVM] lvm pvs
failed: 5 [] [' WARNING: Not using lvmetad because config setting
use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to
make changes visible (pvscan --cache).', ' Failed to find physical
volume "/dev/mapper/3600140518c78046b1594d14a65e5f9ad".'] (lvm:324)
2016-12-18 07:54:45,883 WARN (jsonrpc/2) [storage.HSM] getPV failed
for guid: 3600140518c78046b1594d14a65e5f9ad (hsm:1970)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s):
"devices=(u'3600140518c78046b1594d14a65e5f9ad',)"
2016-12-18 07:54:45,944 WARN (jsonrpc/2) [storage.LVM] lvm pvs
failed: 5 [] [' WARNING: Not using lvmetad because config setting
use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to
make changes visible (pvscan --cache).', ' Failed to find physical
volume "/dev/mapper/360014052e4947abdee947b5b55c9e28a".'] (lvm:324)
2016-12-18 07:54:45,944 WARN (jsonrpc/2) [storage.HSM] getPV failed
for guid: 360014052e4947abdee947b5b55c9e28a (hsm:1970)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s):
"devices=(u'360014052e4947abdee947b5b55c9e28a',)"
2016-12-18 07:54:45,977 INFO (jsonrpc/2) [dispatcher] Run and
protect: getDeviceList, Return response: {'devList': [{'status':
'unknown', 'vendorID': 'LIO-ORG', 'capacity':
'21474836480', 'fwrev':
'4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
'pvsize': '',
'pathlist': [{'initiatorname': 'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sda', 'capacity':
'21474836480',
'state': 'active', 'lun': '0'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID':
'360014052e4947abdee947b5b55c9e28a', 'productID': 'lun0_bdev'},
{'status': 'unknown', 'vendorID': 'LIO-ORG',
'capacity':
'21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0,
'vgUUID': '',
'pvsize': '', 'pathlist': [{'initiatorname':
'default', 'connection':
'192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage',
'portal':
'1', 'user': 'username', 'password': '********',
'port': '3260'}],
'logicalblocksize': '512', 'discard_max_bytes': 0,
'pathstatus':
[{'type': 'iSCSI', 'physdev': 'sde', 'capacity':
'21474836480',
'state': 'active', 'lun': '4'}], 'devtype':
'iSCSI',
'physicalblocksize': '512', 'pvUUID': '',
'serial':
'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID':
'3600140518c78046b1594d14a65e5f9ad', 'productID': 'lun4_bdev'}]}
(logUtils:52)
2016-12-18 07:54:45,978 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
call Host.getDeviceList succeeded in 0.29 seconds (__init__:515)
Looks a pv is not accessible - storage issue?
Please check that the pvs are clean before using them for the test.
Clearing the luns on the the storage before the test can be a good idea.
2016-12-18 07:54:47,165 INFO (jsonrpc/4) [dispatcher] Run and
protect: createVG(vgname=u'097c3032-99da-4e18-b547-19ff340f846e',
devlist=[u'3600140518c78046b1594d14a65e5f9ad',
u'360014052e4947abdee947b5b55c9e28a'], force=False, options=None)
(logUtils:49)
Engine is trying to create a vg with the pv
360014052e4947abdee947b5b55c9e28a that
fail before in getDeviceList.
This looks like a bug in engine, ignoring the results of getDeviceList.
2016-12-18 07:54:47,552 INFO (jsonrpc/4) [dispatcher] Run and
protect: createVG, Return response: {'uuid':
'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW'} (logUtils:52)
2016-12-18 07:54:47,552 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC
call LVMVolumeGroup.create succeeded in 0.38 seconds (__init__:515)
2016-12-18 07:54:48,562 INFO (jsonrpc/5) [dispatcher] Run and
protect: createStorageDomain(storageType=3,
sdUUID=u'097c3032-99da-4e18-b547-19ff340f846e', domainName=u'iscsi',
typeSpecificArg=u'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW', domClass=1,
domVersion=u'3', options=None) (logUtils:49)
2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache]
looking for unfetched domain 097c3032-99da-4e18-b547-19ff340f846e
(sdc:151)
2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache]
looking for domain 097c3032-99da-4e18-b547-19ff340f846e (sdc:168)
2016-12-18 07:54:48,564 ERROR (jsonrpc/5) [storage.StorageDomainCache]
domain 097c3032-99da-4e18-b547-19ff340f846e not found (sdc:157)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'097c3032-99da-4e18-b547-19ff340f846e',)
We a lot of these errors in the rest of the log. This meas something
is wrong with this vg.
Needs deeper investigation from storage developer on both engine and vdsm side,
but I would start by making sure we use clean luns. We are not trying
to test esoteric
negative flows in the system tests.
Did we change something in the system tests project or lago while we
were not looking?
Can we reproduce this issue manually with same engine and vdsm versions?
Nir
Please have a look.
Thanks,
Barak.
--
Barak Korren
bkorren(a)redhat.com
RHCE, RHCi, RHV-DevOps Team
https://ifireball.wordpress.com/
_______________________________________________
Infra mailing list
Infra(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra