Hi,
We had a failure in OST for test 002_bootstrap.verify_add_all_
hosts.
From the logs I can see that vdsm on host0 was reporting that it cannot find the physical volume but eventually the storage was created and is reported as responsive.
However, Host1 is reported to became non-operational with storage domain does not exist error and I think that there is a race.
I think that we create the storage domain while host1 is being installed and if the domain is not created and reported as activated in time, host1 will become nonOperational.
are we starting installation of host1 before host0 and storage are active?
Link to suspected patches: I do not think that the patch reported is related to the error
https://gerrit.ovirt.org/#/c/
84133/
Link to Job:
http://jenkins.ovirt.org/job/
ovirt-master_change-queue- tester/3902/
Link to all logs:
http://jenkins.ovirt.org/job/
ovirt-master_change-queue- tester/3902/artifact/
(Relevant) error snippet from the log:
<error>
Lago log:
2017-11-18 11:15:25,472::log_utils.py::
end_log_task::670::nose::INFO: : # add_master_storage_domain: ESC[32mSuccessESC[0m (in 0:01:09)
2017-11-18 11:15:25,472::log_utils.py::start_log_task::655::nose:: INFO:: # add_secondary_storage_domains: ESC[0mESC[0m
2017-11-18 11:16:47,455::log_utils.py::end_log_task::670::nose::INFO: : # add_secondary_storage_domains: ESC[32mSuccessESC[0m (in 0:01:21)
2017-11-18 11:16:47,456::log_utils.py::start_log_task::655::nose:: INFO:: # import_templates: ESC[0mESC[0m
2017-11-18 11:16:47,513::testlib.py::stopTest::198::nose::INFO:: * SKIPPED: Exported domain generation not supported yet
2017-11-18 11:16:47,514::log_utils.py::end_log_task::670::nose::INFO: : # import_templates: ESC[32mSuccessESC[0m (in 0:00:00)
2017-11-18 11:16:47,514::log_utils.py::start_log_task::655::nose:: INFO:: # verify_add_all_hosts: ESC[0mESC[0m
2017-11-18 11:16:47,719::testlib.py::assert_equals_within::227:: ovirtlago.testlib::ERROR:: * Unhandled exception in <function <lambda> at 0x2909230>
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py" , line 219, in assert_equals_within
res = func()
File "/home/jenkins/workspace/ovirt-master_change-queue- tester/ovirt-system-tests/ basic-suite-master/test- scenarios/002_bootstrap.py", line 430, in <lambda>
lambda: _all_hosts_up(hosts_service, total_hosts)
File "/home/jenkins/workspace/ovirt-master_change-queue- tester/ovirt-system-tests/ basic-suite-master/test- scenarios/002_bootstrap.py", line 129, in _all_hosts_up
_check_problematic_hosts(hosts_service)
File "/home/jenkins/workspace/ovirt-master_change-queue- tester/ovirt-system-tests/ basic-suite-master/test- scenarios/002_bootstrap.py", line 149, in _check_problematic_hosts
raise RuntimeError(dump_hosts)
RuntimeError: 1 hosts failed installation:
lago-basic-suite-master-host-1: non_operational
2017-11-18 11:16:47,722::utils.py::wrapper::480::lago.utils:: DEBUG::Looking for a workdir
2017-11-18 11:16:47,722::workdir.py::resolve_workdir_path::361:: lago.workdir::DEBUG::Checking if /dev/shm/ost/deployment-basic- suite-master is a workdir
2017-11-18 11:16:47,724::log_utils.py::__enter__::600::lago.prefix:: INFO:: * Collect artifacts: ESC[0mESC[0m
2017-11-18 11:16:47,724::log_utils.py::__enter__::600::lago.prefix:: INFO:: * Collect artifacts: ESC[0mESC[0m
vdsm host0:
2017-11-18 06:14:23,980-0500 INFO (jsonrpc/0) [vdsm.api] START getDeviceList(storageType=3, guids=[u'
360014059618895272774e97a2aaf5 dd6'], checkStatus=False, options={}) from=::ffff:192.168.201.4, 45636, flow_id=ed8310a1-a7af-4a67- b351-8ff
364766b8a, task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:46)
2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.LVM] lvm pvs failed: 5 [] [' Failed to find physical volume "/dev/mapper/360014059618895272774e97a2aaf5 dd6".'] (lvm:322)
2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.HSM] getPV failed for guid: 360014059618895272774e97a2aaf5dd6 (hsm:1973)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1970, in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 852, in getPV
raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s): "devices=(u'360014059618895272774e97a2aaf5 dd6',)"
2017-11-18 06:14:24,389-0500 INFO (jsonrpc/0) [vdsm.api] FINISH getDeviceList return={'devList': [{'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', 'pathlist': [{'initiatorname': u'default', 'connection': u'192.168.200.4', 'iqn': u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user': u'username', 'password': '********', 'port': '3260'}, {'initiatorname': u'default', 'connection': u'192.168.201.4', 'iqn': u'iqn.2014-07.org.ovirt: storage', 'portal': '1', 'user': u'username', 'password': '********', 'port': '3260'}], 'logicalblocksize': '512', 'discard_max_bytes': 1073741824, 'pathstatus': [{'type': 'iSCSI', 'physdev': 'sda', 'capacity': '21474836480', 'state': 'active', 'lun': '0'}, {'type': 'iSCSI', 'physdev': 'sdf', 'capacity': '21474836480', 'state': 'active', 'lun': '0'}], 'devtype': 'iSCSI', 'physicalblocksize': '512', 'pvUUID': '', 'serial': 'SLIO-ORG_lun0_bdev_96188952- 7277-4e97-a2aa-f5dd6aad6fc2', 'GUID': ' 360014059618895272774e97a2aaf5 dd6', 'productID': 'lun0_bdev'}]} from=::ffff:192.168.201.4, 45636, flow_id=ed8310a1-a7af-4a67- b351-8ff364766b8a, task_id=6ced0092-34cd-49f0- aa0f-6aae498af37f (api:52)
2017-11-18 06:14:31,788-0500 INFO (jsonrpc/0) [vdsm.api] FINISH getStorageDomainInfo return={'info': {'uuid': 'cc61e074-a3b6-4371-9185-
66079a39f123', 'vgMetadataDevice': ' 360014059618895272774e97a2aaf5 dd6', 'vguuid': '7ifbmt-0elj-uWZZ-zS
LG-plA8-8hd3-JG298b', 'metadataDevice': '360014059618895272774e97a2aaf5 dd6', 'state': 'OK', 'version': '4', 'role': 'Regular', 'type': 'ISCSI', 'class': 'Data', 'pool': [], 'name': 'iscsi'}} from=::ffff:192.168.201.4, 45636, flow_id=2c1876
99, task_id=c2080b61-d4a5-4bdb-9d75-f81580a8257a (api:
vdsm host1:
2017-11-18 06:16:34,315-0500 ERROR (monitor/c65437c) [storage.Monitor] Setting up monitor for c65437ce-339f-4b01-aeb5-
45c1d486bf49 failed (monitor:329)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor. py", line 326, in _setupLoop
self._setupMonitor()
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor. py", line 348, in _setupMonitor
self._produceDomain()
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 177, in wrapper
value = meth(self, *a, **kw)
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor. py", line 366, in _produceDomain
self.domain = sdCache.produce(self.sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
domain.getRealDomain()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
return findMethod(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'c65437ce-339f-4b01-aeb5-45c1d486bf49',)
2017-11-18 06:16:40,377-0500 INFO (jsonrpc/7) [api.host] START getStats() from=::ffff:192.168.201.4,58722 (api:46)
2017-11-18 06:16:40,378-0500 INFO (jsonrpc/7) [vdsm.api] START repoStats(domains=()) from=::ffff:192.168.201.4,58722, task_id=8fb74944-08c0-491e- ad55-a7a9f0a11ef8 (api:46)
2017-11-18 06:16:40,379-0500 INFO (jsonrpc/7) [vdsm.api] FINISH repoStats return={u'c65437ce-339f-4b01-aeb5-45c1d486bf49': {'code': 358, 'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '6.1', 'valid': False},
u'cc61e074-a3b6-4371-9185-66079a39f123': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00103987', 'lastCheck': '6.5', 'valid': True}} from=::ffff:192.168.201.4, 58722, task_id=8fb74944-08c0-491e- ad55-a7a9f0a11ef8
(api:52)
engine log:
2017-11-18 06:15:54,040-05 ERROR [org.ovirt.engine.core.
vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory- engine-Thread-29) [4ce8aff3] Domain 'c65437ce-339f-4b01-aeb5- 45c1d486bf49:nfs' was reported with error code '358'
2017-11-18 06:15:54,041-05 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (EE-ManagedThreadFactory- engine-Thread-29) [4ce8aff3] Storage Domain 'nfs' of pool 'test-dc' is in problem in host 'lago-basic-suite-master-host- 1'
2017-11-18 06:15:54,045-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling. AuditLogDirector] (EE-ManagedThreadFactory- engine-Thread-29) [4ce8aff3] EVENT_ID: VDS_STORAGE_VDS_STATS_FAILED( 189), Host lago-basic-suite-master-host-1
reports about one of the Active Storage Domains as Problematic.
</error>
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel