[ OST Failure Report ] [ oVirt master ] [ 20-11-1017 ] [ 002_bootstrap.verify_add_all_hosts ]

Dafna Ron dron at redhat.com
Mon Nov 20 13:10:05 UTC 2017


Hi,

We had a failure in OST for test 002_bootstrap.verify_add_all_hosts.

>From the logs I can see that vdsm on host0 was reporting that it cannot
find the physical volume but eventually the storage was created and is
reported as responsive.

However, Host1 is reported to became non-operational with storage domain
does not exist error and I think that there is a race.

I think that we create the storage domain while host1 is being installed
and if the domain is not created and reported as activated in time,
host1 will become nonOperational.

are we starting installation of host1 before host0 and storage are active?

**

*Link to suspected patches: I do not think that the patch reported is
related to the error*

*

**

**

https://gerrit.ovirt.org/#/c/84133/**


Link to Job:

**

http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/


Link to all logs:


*

*http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/
*

*
*

*(Relevant) error snippet from the log: *

*

<error>


Lago log:
*

2017-11-18 11:15:25,472::log_utils.py::end_log_task::670::nose::INFO:: 
# add_master_storage_domain: ESC[32mSuccessESC[0m (in 0:01:09)
2017-11-18
11:15:25,472::log_utils.py::start_log_task::655::nose::INFO::  #
add_secondary_storage_domains: ESC[0mESC[0m
2017-11-18 11:16:47,455::log_utils.py::end_log_task::670::nose::INFO:: 
# add_secondary_storage_domains: ESC[32mSuccessESC[0m (in 0:01:21)
2017-11-18
11:16:47,456::log_utils.py::start_log_task::655::nose::INFO::  #
import_templates: ESC[0mESC[0m
2017-11-18 11:16:47,513::testlib.py::stopTest::198::nose::INFO::    *
SKIPPED: Exported domain generation not supported yet
2017-11-18 11:16:47,514::log_utils.py::end_log_task::670::nose::INFO:: 
# import_templates: ESC[32mSuccessESC[0m (in 0:00:00)
2017-11-18
11:16:47,514::log_utils.py::start_log_task::655::nose::INFO::  #
verify_add_all_hosts: ESC[0mESC[0m
2017-11-18
11:16:47,719::testlib.py::assert_equals_within::227::ovirtlago.testlib::ERROR::   
* Unhandled exception in <function <lambda> at 0x2909230>
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
219, in assert_equals_within
    res = func()
  File
"/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 430, in <lambda>
    lambda: _all_hosts_up(hosts_service, total_hosts)
  File
"/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 129, in _all_hosts_up
    _check_problematic_hosts(hosts_service)
  File
"/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 149, in _check_problematic_hosts
    raise RuntimeError(dump_hosts)
RuntimeError: 1 hosts failed installation:
lago-basic-suite-master-host-1: non_operational

2017-11-18
11:16:47,722::utils.py::wrapper::480::lago.utils::DEBUG::Looking for a
workdir
2017-11-18
11:16:47,722::workdir.py::resolve_workdir_path::361::lago.workdir::DEBUG::Checking
if /dev/shm/ost/deployment-basic-suite-master is a workdir
2017-11-18
11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::    *
Collect artifacts: ESC[0mESC[0m
2017-11-18
11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::    *
Collect artifacts: ESC[0mESC[0m

vdsm host0:

2017-11-18 06:14:23,980-0500 INFO  (jsonrpc/0) [vdsm.api] START
getDeviceList(storageType=3,
guids=[u'360014059618895272774e97a2aaf5dd6'], checkStatus=False,
options={}) from=::ffff:192.168.201.4,45636,
flow_id=ed8310a1-a7af-4a67-b351-8ff
364766b8a, task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:46)
2017-11-18 06:14:24,353-0500 WARN  (jsonrpc/0) [storage.LVM] lvm pvs
failed: 5 [] ['  Failed to find physical volume
"/dev/mapper/360014059618895272774e97a2aaf5dd6".'] (lvm:322)
2017-11-18 06:14:24,353-0500 WARN  (jsonrpc/0) [storage.HSM] getPV
failed for guid: 360014059618895272774e97a2aaf5dd6 (hsm:1973)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line
1970, in _getDeviceList
    pv = lvm.getPV(guid)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 852,
in getPV
    raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s):
"devices=(u'360014059618895272774e97a2aaf5dd6',)"
2017-11-18 06:14:24,389-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH
getDeviceList return={'devList': [{'status': 'unknown', 'vendorID':
'LIO-ORG', 'capacity': '21474836480', 'fwrev': '4.0',
'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', 'pathlist':
[{'initiatorname': u'default', 'connection': u'192.168.200.4', 'iqn':
u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user': u'username',
'password': '********', 'port': '3260'}, {'initiatorname': u'default',
'connection': u'192.168.201.4', 'iqn': u'iqn.2014-07.org.ovirt:storage',
'portal': '1', 'user': u'username', 'password': '********', 'port':
'3260'}], 'logicalblocksize': '512', 'discard_max_bytes': 1073741824,
'pathstatus': [{'type': 'iSCSI', 'physdev': 'sda', 'capacity':
'21474836480', 'state': 'active', 'lun': '0'}, {'type': 'iSCSI',
'physdev': 'sdf', 'capacity': '21474836480', 'state': 'active', 'lun':
'0'}], 'devtype': 'iSCSI', 'physicalblocksize': '512', 'pvUUID': '',
'serial': 'SLIO-ORG_lun0_bdev_96188952-7277-4e97-a2aa-f5dd6aad6fc2',
'GUID': '360014059618895272774e97a2aaf5dd6', 'productID': 'lun0_bdev'}]}
from=::ffff:192.168.201.4,45636,
flow_id=ed8310a1-a7af-4a67-b351-8ff364766b8a,
task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:52)


2017-11-18 06:14:31,788-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH
getStorageDomainInfo return={'info': {'uuid':
'cc61e074-a3b6-4371-9185-66079a39f123', 'vgMetadataDevice':
'360014059618895272774e97a2aaf5dd6', 'vguuid': '7ifbmt-0elj-uWZZ-zS
LG-plA8-8hd3-JG298b', 'metadataDevice':
'360014059618895272774e97a2aaf5dd6', 'state': 'OK', 'version': '4',
'role': 'Regular', 'type': 'ISCSI', 'class': 'Data', 'pool': [], 'name':
'iscsi'}} from=::ffff:192.168.201.4,45636, flow_id=2c1876
99, task_id=c2080b61-d4a5-4bdb-9d75-f81580a8257a (api:

vdsm host1:

2017-11-18 06:16:34,315-0500 ERROR (monitor/c65437c) [storage.Monitor]
Setting up monitor for c65437ce-339f-4b01-aeb5-45c1d486bf49 failed
(monitor:329)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
326, in _setupLoop
    self._setupMonitor()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
348, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 177, in
wrapper
    value = meth(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
366, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110,
in produce
    domain.getRealDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51,
in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134,
in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151,
in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176,
in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'c65437ce-339f-4b01-aeb5-45c1d486bf49',)
2017-11-18 06:16:40,377-0500 INFO  (jsonrpc/7) [api.host] START
getStats() from=::ffff:192.168.201.4,58722 (api:46)
2017-11-18 06:16:40,378-0500 INFO  (jsonrpc/7) [vdsm.api] START
repoStats(domains=()) from=::ffff:192.168.201.4,58722,
task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 (api:46)
2017-11-18 06:16:40,379-0500 INFO  (jsonrpc/7) [vdsm.api] FINISH
repoStats return={u'c65437ce-339f-4b01-aeb5-45c1d486bf49': {'code': 358,
'actual': True, 'version': -1, 'acquired': False, 'delay': '0',
'lastCheck': '6.1', 'valid': False},
 u'cc61e074-a3b6-4371-9185-66079a39f123': {'code': 0, 'actual': True,
'version': 4, 'acquired': True, 'delay': '0.00103987', 'lastCheck':
'6.5', 'valid': True}} from=::ffff:192.168.201.4,58722,
task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8
 (api:52)

engine log:

2017-11-18 06:15:54,040-05 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
(EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Domain
'c65437ce-339f-4b01-aeb5-45c1d486bf49:nfs' was reported with error code
'358'
2017-11-18 06:15:54,041-05 ERROR
[org.ovirt.engine.core.bll.InitVdsOnUpCommand]
(EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Storage Domain
'nfs' of pool 'test-dc' is in problem in host
'lago-basic-suite-master-host-1'
2017-11-18 06:15:54,045-05 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] EVENT_ID:
VDS_STORAGE_VDS_STATS_FAILED(189), Host lago-basic-suite-master-host-1
 reports about one of the Active Storage Domains as Problematic.**
**

***
*

*</error>*

**


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20171120/48f78908/attachment-0001.html>


More information about the Infra mailing list