[ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 20-11-1017 ] [ 002_bootstrap.verify_add_all_hosts ]
Dafna Ron
dron at redhat.com
Mon Nov 20 13:51:00 UTC 2017
On 11/20/2017 01:16 PM, Yaniv Kaul wrote:
>
>
> On Mon, Nov 20, 2017 at 3:10 PM, Dafna Ron <dron at redhat.com
> <mailto:dron at redhat.com>> wrote:
>
> Hi,
>
> We had a failure in OST for test 002_bootstrap.verify_add_all_hosts.
>
> From the logs I can see that vdsm on host0 was reporting that it
> cannot find the physical volume but eventually the storage was
> created and is reported as responsive.
>
> However, Host1 is reported to became non-operational with storage
> domain does not exist error and I think that there is a race.
>
>
> I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1514906 on this.
>
>
> I think that we create the storage domain while host1 is being
> installed and if the domain is not created and reported as
> activated in time, host1 will become nonOperational.
>
>
> And based on the above description, this is exactly the issue I've
> described in the BZ.
> Y.
>
+1 this seems exactly the same issue.
> are we starting installation of host1 before host0 and storage are
> active?
>
> **
>
> *Link to suspected patches: I do not think that the patch reported
> is related to the error*
>
> *
>
> **
>
> **
>
> https://gerrit.ovirt.org/#/c/84133/
> <https://gerrit.ovirt.org/#/c/84133/>**
>
>
> Link to Job:
>
> **
>
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/
> <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/>
>
>
> Link to all logs:
>
>
> *
>
> *http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/
> <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/>
> *
>
> *
> *
>
> *(Relevant) error snippet from the log: *
>
> *
>
> <error>
>
>
> Lago log:
> *
>
> 2017-11-18
> 11:15:25,472::log_utils.py::end_log_task::670::nose::INFO:: #
> add_master_storage_domain: ESC[32mSuccessESC[0m (in 0:01:09)
> 2017-11-18
> 11:15:25,472::log_utils.py::start_log_task::655::nose::INFO:: #
> add_secondary_storage_domains: ESC[0mESC[0m
> 2017-11-18
> 11:16:47,455::log_utils.py::end_log_task::670::nose::INFO:: #
> add_secondary_storage_domains: ESC[32mSuccessESC[0m (in 0:01:21)
> 2017-11-18
> 11:16:47,456::log_utils.py::start_log_task::655::nose::INFO:: #
> import_templates: ESC[0mESC[0m
> 2017-11-18
> 11:16:47,513::testlib.py::stopTest::198::nose::INFO:: *
> SKIPPED: Exported domain generation not supported yet
> 2017-11-18
> 11:16:47,514::log_utils.py::end_log_task::670::nose::INFO:: #
> import_templates: ESC[32mSuccessESC[0m (in 0:00:00)
> 2017-11-18
> 11:16:47,514::log_utils.py::start_log_task::655::nose::INFO:: #
> verify_add_all_hosts: ESC[0mESC[0m
> 2017-11-18
> 11:16:47,719::testlib.py::assert_equals_within::227::ovirtlago.testlib::ERROR::
> * Unhandled exception in <function <lambda> at 0x2909230>
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py",
> line 219, in assert_equals_within
> res = func()
> File
> "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
> line 430, in <lambda>
> lambda: _all_hosts_up(hosts_service, total_hosts)
> File
> "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
> line 129, in _all_hosts_up
> _check_problematic_hosts(hosts_service)
> File
> "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
> line 149, in _check_problematic_hosts
> raise RuntimeError(dump_hosts)
> RuntimeError: 1 hosts failed installation:
> lago-basic-suite-master-host-1: non_operational
>
> 2017-11-18
> 11:16:47,722::utils.py::wrapper::480::lago.utils::DEBUG::Looking
> for a workdir
> 2017-11-18
> 11:16:47,722::workdir.py::resolve_workdir_path::361::lago.workdir::DEBUG::Checking
> if /dev/shm/ost/deployment-basic-suite-master is a workdir
> 2017-11-18
> 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::
> * Collect artifacts: ESC[0mESC[0m
> 2017-11-18
> 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::
> * Collect artifacts: ESC[0mESC[0m
>
> vdsm host0:
>
> 2017-11-18 06:14:23,980-0500 INFO (jsonrpc/0) [vdsm.api] START
> getDeviceList(storageType=3,
> guids=[u'360014059618895272774e97a2aaf5dd6'], checkStatus=False,
> options={}) from=::ffff:192.168.201.4,45636,
> flow_id=ed8310a1-a7af-4a67-b351-8ff
> 364766b8a, task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:46)
> 2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.LVM] lvm
> pvs failed: 5 [] [' Failed to find physical volume
> "/dev/mapper/360014059618895272774e97a2aaf5dd6".'] (lvm:322)
> 2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.HSM] getPV
> failed for guid: 360014059618895272774e97a2aaf5dd6 (hsm:1973)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py",
> line 1970, in _getDeviceList
> pv = lvm.getPV(guid)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py",
> line 852, in getPV
> raise se.InaccessiblePhysDev((pvName,))
> InaccessiblePhysDev: Multipath cannot access physical device(s):
> "devices=(u'360014059618895272774e97a2aaf5dd6',)"
> 2017-11-18 06:14:24,389-0500 INFO (jsonrpc/0) [vdsm.api] FINISH
> getDeviceList return={'devList': [{'status': 'unknown',
> 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev': '4.0',
> 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', 'pathlist':
> [{'initiatorname': u'default', 'connection': u'192.168.200.4',
> 'iqn': u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user':
> u'username', 'password': '********', 'port': '3260'},
> {'initiatorname': u'default', 'connection': u'192.168.201.4',
> 'iqn': u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user':
> u'username', 'password': '********', 'port': '3260'}],
> 'logicalblocksize': '512', 'discard_max_bytes': 1073741824,
> 'pathstatus': [{'type': 'iSCSI', 'physdev': 'sda', 'capacity':
> '21474836480', 'state': 'active', 'lun': '0'}, {'type': 'iSCSI',
> 'physdev': 'sdf', 'capacity': '21474836480', 'state': 'active',
> 'lun': '0'}], 'devtype': 'iSCSI', 'physicalblocksize': '512',
> 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun0_bdev_96188952-7277-4e97-a2aa-f5dd6aad6fc2', 'GUID':
> '360014059618895272774e97a2aaf5dd6', 'productID': 'lun0_bdev'}]}
> from=::ffff:192.168.201.4,45636,
> flow_id=ed8310a1-a7af-4a67-b351-8ff364766b8a,
> task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:52)
>
>
> 2017-11-18 06:14:31,788-0500 INFO (jsonrpc/0) [vdsm.api] FINISH
> getStorageDomainInfo return={'info': {'uuid':
> 'cc61e074-a3b6-4371-9185-66079a39f123', 'vgMetadataDevice':
> '360014059618895272774e97a2aaf5dd6', 'vguuid': '7ifbmt-0elj-uWZZ-zS
> LG-plA8-8hd3-JG298b', 'metadataDevice':
> '360014059618895272774e97a2aaf5dd6', 'state': 'OK', 'version':
> '4', 'role': 'Regular', 'type': 'ISCSI', 'class': 'Data', 'pool':
> [], 'name': 'iscsi'}} from=::ffff:192.168.201.4,45636, flow_id=2c1876
> 99, task_id=c2080b61-d4a5-4bdb-9d75-f81580a8257a (api:
>
> vdsm host1:
>
> 2017-11-18 06:16:34,315-0500 ERROR (monitor/c65437c)
> [storage.Monitor] Setting up monitor for
> c65437ce-339f-4b01-aeb5-45c1d486bf49 failed (monitor:329)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
> line 326, in _setupLoop
> self._setupMonitor()
> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
> line 348, in _setupMonitor
> self._produceDomain()
> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 177,
> in wrapper
> value = meth(self, *a, **kw)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
> line 366, in _produceDomain
> self.domain = sdCache.produce(self.sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 110, in produce
> domain.getRealDomain()
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 51, in getRealDomain
> return self._cache._realProduce(self._sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 134, in _realProduce
> domain = self._findDomain(sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 151, in _findDomain
> return findMethod(sdUUID)
> File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
> line 176, in _findUnfetchedDomain
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'c65437ce-339f-4b01-aeb5-45c1d486bf49',)
> 2017-11-18 06:16:40,377-0500 INFO (jsonrpc/7) [api.host] START
> getStats() from=::ffff:192.168.201.4,58722 (api:46)
> 2017-11-18 06:16:40,378-0500 INFO (jsonrpc/7) [vdsm.api] START
> repoStats(domains=()) from=::ffff:192.168.201.4,58722,
> task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 (api:46)
> 2017-11-18 06:16:40,379-0500 INFO (jsonrpc/7) [vdsm.api] FINISH
> repoStats return={u'c65437ce-339f-4b01-aeb5-45c1d486bf49':
> {'code': 358, 'actual': True, 'version': -1, 'acquired': False,
> 'delay': '0', 'lastCheck': '6.1', 'valid': False},
> u'cc61e074-a3b6-4371-9185-66079a39f123': {'code': 0, 'actual':
> True, 'version': 4, 'acquired': True, 'delay': '0.00103987',
> 'lastCheck': '6.5', 'valid': True}}
> from=::ffff:192.168.201.4,58722,
> task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8
> (api:52)
>
> engine log:
>
> 2017-11-18 06:15:54,040-05 ERROR
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
> (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Domain
> 'c65437ce-339f-4b01-aeb5-45c1d486bf49:nfs' was reported with error
> code '358'
> 2017-11-18 06:15:54,041-05 ERROR
> [org.ovirt.engine.core.bll.InitVdsOnUpCommand]
> (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Storage
> Domain 'nfs' of pool 'test-dc' is in problem in host
> 'lago-basic-suite-master-host-1'
> 2017-11-18 06:15:54,045-05 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] EVENT_ID:
> VDS_STORAGE_VDS_STATS_FAILED(189), Host lago-basic-suite-master-host-1
> reports about one of the Active Storage Domains as Problematic.**
> **
>
> *
> *
>
> *</error>*
>
> **
>
>
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org <mailto:Devel at ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/devel
> <http://lists.ovirt.org/mailman/listinfo/devel>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20171120/19299eeb/attachment-0001.html>
More information about the Infra
mailing list