Hi,
We had a failure in OST for test 002_bootstrap.verify_add_all_hosts.
From the logs I can see that vdsm on host0 was reporting that it cannot
find the physical volume but eventually the storage was created and is
reported as responsive.
However, Host1 is reported to became non-operational with storage domain
does not exist error and I think that there is a race.
I think that we create the storage domain while host1 is being
installed
and if the domain is not created and reported as activated in time, host1
will become nonOperational.
And based on the above description, this is exactly the issue I've
described in the BZ.
Y.
are we starting installation of host1 before host0 and storage are
active?
*Link to suspected patches: I do not think that the patch reported is
related to the error*
*
https://gerrit.ovirt.org/#/c/84133/
<
https://gerrit.ovirt.org/#/c/84133/> Link to Job: *
*
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/
<
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/> Link
to all logs: *
*http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/
<
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifa...
*
*(Relevant) error snippet from the log: *
* <error> Lago log: *
2017-11-18 11:15:25,472::log_utils.py::end_log_task::670::nose::INFO:: #
add_master_storage_domain: ESC[32mSuccessESC[0m (in 0:01:09)
2017-11-18 11:15:25,472::log_utils.py::start_log_task::655::nose::INFO::
# add_secondary_storage_domains: ESC[0mESC[0m
2017-11-18 11:16:47,455::log_utils.py::end_log_task::670::nose::INFO:: #
add_secondary_storage_domains: ESC[32mSuccessESC[0m (in 0:01:21)
2017-11-18 11:16:47,456::log_utils.py::start_log_task::655::nose::INFO::
# import_templates: ESC[0mESC[0m
2017-11-18 11:16:47,513::testlib.py::stopTest::198::nose::INFO:: *
SKIPPED: Exported domain generation not supported yet
2017-11-18 11:16:47,514::log_utils.py::end_log_task::670::nose::INFO:: #
import_templates: ESC[32mSuccessESC[0m (in 0:00:00)
2017-11-18 11:16:47,514::log_utils.py::start_log_task::655::nose::INFO::
# verify_add_all_hosts: ESC[0mESC[0m
2017-11-18
11:16:47,719::testlib.py::assert_equals_within::227::ovirtlago.testlib::ERROR::
* Unhandled exception in <function <lambda> at 0x2909230>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 219,
in assert_equals_within
res = func()
File "/home/jenkins/workspace/ovirt-master_change-queue-
tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 430, in <lambda>
lambda: _all_hosts_up(hosts_service, total_hosts)
File "/home/jenkins/workspace/ovirt-master_change-queue-
tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 129, in _all_hosts_up
_check_problematic_hosts(hosts_service)
File "/home/jenkins/workspace/ovirt-master_change-queue-
tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 149, in _check_problematic_hosts
raise RuntimeError(dump_hosts)
RuntimeError: 1 hosts failed installation:
lago-basic-suite-master-host-1: non_operational
2017-11-18 11:16:47,722::utils.py::wrapper::480::lago.utils::DEBUG::Looking
for a workdir
2017-11-18
11:16:47,722::workdir.py::resolve_workdir_path::361::lago.workdir::DEBUG::Checking
if /dev/shm/ost/deployment-basic-suite-master is a workdir
2017-11-18 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::
* Collect artifacts: ESC[0mESC[0m
2017-11-18 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::
* Collect artifacts: ESC[0mESC[0m
vdsm host0:
2017-11-18 06:14:23,980-0500 INFO (jsonrpc/0) [vdsm.api] START
getDeviceList(storageType=3, guids=[u'360014059618895272774e97a2aaf5dd6'],
checkStatus=False, options={}) from=::ffff:192.168.201.4,45636,
flow_id=ed8310a1-a7af-4a67-b351-8ff
364766b8a, task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:46)
2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.LVM] lvm pvs
failed: 5 [] [' Failed to find physical volume "/dev/mapper/
360014059618895272774e97a2aaf5dd6".'] (lvm:322)
2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.HSM] getPV failed
for guid: 360014059618895272774e97a2aaf5dd6 (hsm:1973)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1970,
in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 852,
in getPV
raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s):
"devices=(u'360014059618895272774e97a2aaf5dd6',)"
2017-11-18 06:14:24,389-0500 INFO (jsonrpc/0) [vdsm.api] FINISH
getDeviceList return={'devList': [{'status': 'unknown',
'vendorID':
'LIO-ORG', 'capacity': '21474836480', 'fwrev':
'4.0',
'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize':
'', 'pathlist':
[{'initiatorname': u'default', 'connection':
u'192.168.200.4', 'iqn':
u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user':
u'username',
'password': '********', 'port': '3260'},
{'initiatorname': u'default',
'connection': u'192.168.201.4', 'iqn':
u'iqn.2014-07.org.ovirt:storage',
'portal': '1', 'user': u'username', 'password':
'********', 'port':
'3260'}], 'logicalblocksize': '512', 'discard_max_bytes':
1073741824,
'pathstatus': [{'type': 'iSCSI', 'physdev':
'sda', 'capacity':
'21474836480', 'state': 'active', 'lun': '0'},
{'type': 'iSCSI', 'physdev':
'sdf', 'capacity': '21474836480', 'state':
'active', 'lun': '0'}],
'devtype': 'iSCSI', 'physicalblocksize': '512',
'pvUUID': '', 'serial':
'SLIO-ORG_lun0_bdev_96188952-7277-4e97-a2aa-f5dd6aad6fc2', 'GUID': '
360014059618895272774e97a2aaf5dd6', 'productID': 'lun0_bdev'}]}
from=::ffff:192.168.201.4,45636, flow_id=ed8310a1-a7af-4a67-b351-8ff364766b8a,
task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:52)
2017-11-18 06:14:31,788-0500 INFO (jsonrpc/0) [vdsm.api] FINISH
getStorageDomainInfo return={'info': {'uuid':
'cc61e074-a3b6-4371-9185-66079a39f123',
'vgMetadataDevice': '360014059618895272774e97a2aaf5dd6',
'vguuid':
'7ifbmt-0elj-uWZZ-zS
LG-plA8-8hd3-JG298b', 'metadataDevice':
'360014059618895272774e97a2aaf5dd6',
'state': 'OK', 'version': '4', 'role':
'Regular', 'type': 'ISCSI', 'class':
'Data', 'pool': [], 'name': 'iscsi'}}
from=::ffff:192.168.201.4,45636,
flow_id=2c1876
99, task_id=c2080b61-d4a5-4bdb-9d75-f81580a8257a (api:
vdsm host1:
2017-11-18 06:16:34,315-0500 ERROR (monitor/c65437c) [storage.Monitor]
Setting up monitor for c65437ce-339f-4b01-aeb5-45c1d486bf49 failed
(monitor:329)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
326, in _setupLoop
self._setupMonitor()
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
348, in _setupMonitor
self._produceDomain()
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 177, in
wrapper
value = meth(self, *a, **kw)
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
366, in _produceDomain
self.domain = sdCache.produce(self.sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110,
in produce
domain.getRealDomain()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51,
in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134,
in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151,
in _findDomain
return findMethod(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176,
in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'c65437ce-339f-4b01-aeb5-45c1d486bf49',)
2017-11-18 06:16:40,377-0500 INFO (jsonrpc/7) [api.host] START getStats()
from=::ffff:192.168.201.4,58722 (api:46)
2017-11-18 06:16:40,378-0500 INFO (jsonrpc/7) [vdsm.api] START
repoStats(domains=()) from=::ffff:192.168.201.4,58722,
task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 (api:46)
2017-11-18 06:16:40,379-0500 INFO (jsonrpc/7) [vdsm.api] FINISH repoStats
return={u'c65437ce-339f-4b01-aeb5-45c1d486bf49': {'code': 358,
'actual':
True, 'version': -1, 'acquired': False, 'delay': '0',
'lastCheck': '6.1',
'valid': False},
u'cc61e074-a3b6-4371-9185-66079a39f123': {'code': 0, 'actual':
True,
'version': 4, 'acquired': True, 'delay': '0.00103987',
'lastCheck': '6.5',
'valid': True}} from=::ffff:192.168.201.4,58722,
task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8
(api:52)
engine log:
2017-11-18 06:15:54,040-05 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
(EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Domain
'c65437ce-339f-4b01-aeb5-45c1d486bf49:nfs' was reported with error code
'358'
2017-11-18 06:15:54,041-05 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand]
(EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Storage Domain
'nfs' of pool 'test-dc' is in problem in host
'lago-basic-suite-master-host-
1'
2017-11-18 06:15:54,045-05 ERROR [org.ovirt.engine.core.dal.
dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-29)
[4ce8aff3] EVENT_ID: VDS_STORAGE_VDS_STATS_FAILED(189), Host
lago-basic-suite-master-host-1
reports about one of the Active Storage Domains as Problematic.
*</error>*
_______________________________________________
Devel mailing list
Devel(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel