[ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 20-11-1017 ] [ 002_bootstrap.verify_add_all_hosts ]

Yaniv Kaul ykaul at redhat.com
Mon Nov 20 13:16:01 UTC 2017


On Mon, Nov 20, 2017 at 3:10 PM, Dafna Ron <dron at redhat.com> wrote:

> Hi,
>
> We had a failure in OST for test 002_bootstrap.verify_add_all_hosts.
>
> From the logs I can see that vdsm on host0 was reporting that it cannot
> find the physical volume but eventually the storage was created and is
> reported as responsive.
>
> However, Host1 is reported to became non-operational with storage domain
> does not exist error and I think that there is a race.
>

I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1514906 on this.


> I think that we create the storage domain while host1 is being installed
> and if the domain is not created and reported as activated in time, host1
> will become nonOperational.
>

And based on the above description, this is exactly the issue I've
described in the BZ.
Y.


> are we starting installation of host1 before host0 and storage are active?
>
> *Link to suspected patches: I do not think that the patch reported is
> related to the error*
>
>
> * https://gerrit.ovirt.org/#/c/84133/
> <https://gerrit.ovirt.org/#/c/84133/> Link to Job: *
>
>
>
> * http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/
> <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/> Link
> to all logs: *
>
>
> *http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/
> <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/>
> *
>
>
> *(Relevant) error snippet from the log: *
>
>
>
> * <error> Lago log: *
>
> 2017-11-18 11:15:25,472::log_utils.py::end_log_task::670::nose::INFO::  #
> add_master_storage_domain: ESC[32mSuccessESC[0m (in 0:01:09)
> 2017-11-18 11:15:25,472::log_utils.py::start_log_task::655::nose::INFO::
> # add_secondary_storage_domains: ESC[0mESC[0m
> 2017-11-18 11:16:47,455::log_utils.py::end_log_task::670::nose::INFO::  #
> add_secondary_storage_domains: ESC[32mSuccessESC[0m (in 0:01:21)
> 2017-11-18 11:16:47,456::log_utils.py::start_log_task::655::nose::INFO::
> # import_templates: ESC[0mESC[0m
> 2017-11-18 11:16:47,513::testlib.py::stopTest::198::nose::INFO::    *
> SKIPPED: Exported domain generation not supported yet
> 2017-11-18 11:16:47,514::log_utils.py::end_log_task::670::nose::INFO::  #
> import_templates: ESC[32mSuccessESC[0m (in 0:00:00)
> 2017-11-18 11:16:47,514::log_utils.py::start_log_task::655::nose::INFO::
> # verify_add_all_hosts: ESC[0mESC[0m
> 2017-11-18 11:16:47,719::testlib.py::assert_equals_within::227::ovirtlago.testlib::ERROR::
> * Unhandled exception in <function <lambda> at 0x2909230>
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 219,
> in assert_equals_within
>     res = func()
>   File "/home/jenkins/workspace/ovirt-master_change-queue-
> tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
> line 430, in <lambda>
>     lambda: _all_hosts_up(hosts_service, total_hosts)
>   File "/home/jenkins/workspace/ovirt-master_change-queue-
> tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
> line 129, in _all_hosts_up
>     _check_problematic_hosts(hosts_service)
>   File "/home/jenkins/workspace/ovirt-master_change-queue-
> tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
> line 149, in _check_problematic_hosts
>     raise RuntimeError(dump_hosts)
> RuntimeError: 1 hosts failed installation:
> lago-basic-suite-master-host-1: non_operational
>
> 2017-11-18 11:16:47,722::utils.py::wrapper::480::lago.utils::DEBUG::Looking
> for a workdir
> 2017-11-18 11:16:47,722::workdir.py::resolve_workdir_path::361::lago.workdir::DEBUG::Checking
> if /dev/shm/ost/deployment-basic-suite-master is a workdir
> 2017-11-18 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::
> * Collect artifacts: ESC[0mESC[0m
> 2017-11-18 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::
> * Collect artifacts: ESC[0mESC[0m
>
> vdsm host0:
>
> 2017-11-18 06:14:23,980-0500 INFO  (jsonrpc/0) [vdsm.api] START
> getDeviceList(storageType=3, guids=[u'360014059618895272774e97a2aaf5dd6'],
> checkStatus=False, options={}) from=::ffff:192.168.201.4,45636,
> flow_id=ed8310a1-a7af-4a67-b351-8ff
> 364766b8a, task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:46)
> 2017-11-18 06:14:24,353-0500 WARN  (jsonrpc/0) [storage.LVM] lvm pvs
> failed: 5 [] ['  Failed to find physical volume "/dev/mapper/
> 360014059618895272774e97a2aaf5dd6".'] (lvm:322)
> 2017-11-18 06:14:24,353-0500 WARN  (jsonrpc/0) [storage.HSM] getPV failed
> for guid: 360014059618895272774e97a2aaf5dd6 (hsm:1973)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1970,
> in _getDeviceList
>     pv = lvm.getPV(guid)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 852,
> in getPV
>     raise se.InaccessiblePhysDev((pvName,))
> InaccessiblePhysDev: Multipath cannot access physical device(s):
> "devices=(u'360014059618895272774e97a2aaf5dd6',)"
> 2017-11-18 06:14:24,389-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH
> getDeviceList return={'devList': [{'status': 'unknown', 'vendorID':
> 'LIO-ORG', 'capacity': '21474836480', 'fwrev': '4.0',
> 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', 'pathlist':
> [{'initiatorname': u'default', 'connection': u'192.168.200.4', 'iqn':
> u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user': u'username',
> 'password': '********', 'port': '3260'}, {'initiatorname': u'default',
> 'connection': u'192.168.201.4', 'iqn': u'iqn.2014-07.org.ovirt:storage',
> 'portal': '1', 'user': u'username', 'password': '********', 'port':
> '3260'}], 'logicalblocksize': '512', 'discard_max_bytes': 1073741824,
> 'pathstatus': [{'type': 'iSCSI', 'physdev': 'sda', 'capacity':
> '21474836480', 'state': 'active', 'lun': '0'}, {'type': 'iSCSI', 'physdev':
> 'sdf', 'capacity': '21474836480', 'state': 'active', 'lun': '0'}],
> 'devtype': 'iSCSI', 'physicalblocksize': '512', 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun0_bdev_96188952-7277-4e97-a2aa-f5dd6aad6fc2', 'GUID': '
> 360014059618895272774e97a2aaf5dd6', 'productID': 'lun0_bdev'}]}
> from=::ffff:192.168.201.4,45636, flow_id=ed8310a1-a7af-4a67-b351-8ff364766b8a,
> task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:52)
>
>
> 2017-11-18 06:14:31,788-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH
> getStorageDomainInfo return={'info': {'uuid': 'cc61e074-a3b6-4371-9185-66079a39f123',
> 'vgMetadataDevice': '360014059618895272774e97a2aaf5dd6', 'vguuid':
> '7ifbmt-0elj-uWZZ-zS
> LG-plA8-8hd3-JG298b', 'metadataDevice': '360014059618895272774e97a2aaf5dd6',
> 'state': 'OK', 'version': '4', 'role': 'Regular', 'type': 'ISCSI', 'class':
> 'Data', 'pool': [], 'name': 'iscsi'}} from=::ffff:192.168.201.4,45636,
> flow_id=2c1876
> 99, task_id=c2080b61-d4a5-4bdb-9d75-f81580a8257a (api:
>
> vdsm host1:
>
> 2017-11-18 06:16:34,315-0500 ERROR (monitor/c65437c) [storage.Monitor]
> Setting up monitor for c65437ce-339f-4b01-aeb5-45c1d486bf49 failed
> (monitor:329)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
> 326, in _setupLoop
>     self._setupMonitor()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
> 348, in _setupMonitor
>     self._produceDomain()
>   File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 177, in
> wrapper
>     value = meth(self, *a, **kw)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
> 366, in _produceDomain
>     self.domain = sdCache.produce(self.sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110,
> in produce
>     domain.getRealDomain()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51,
> in getRealDomain
>     return self._cache._realProduce(self._sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134,
> in _realProduce
>     domain = self._findDomain(sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151,
> in _findDomain
>     return findMethod(sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176,
> in _findUnfetchedDomain
>     raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'c65437ce-339f-4b01-aeb5-45c1d486bf49',)
> 2017-11-18 06:16:40,377-0500 INFO  (jsonrpc/7) [api.host] START getStats()
> from=::ffff:192.168.201.4,58722 (api:46)
> 2017-11-18 06:16:40,378-0500 INFO  (jsonrpc/7) [vdsm.api] START
> repoStats(domains=()) from=::ffff:192.168.201.4,58722,
> task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 (api:46)
> 2017-11-18 06:16:40,379-0500 INFO  (jsonrpc/7) [vdsm.api] FINISH repoStats
> return={u'c65437ce-339f-4b01-aeb5-45c1d486bf49': {'code': 358, 'actual':
> True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '6.1',
> 'valid': False},
>  u'cc61e074-a3b6-4371-9185-66079a39f123': {'code': 0, 'actual': True,
> 'version': 4, 'acquired': True, 'delay': '0.00103987', 'lastCheck': '6.5',
> 'valid': True}} from=::ffff:192.168.201.4,58722,
> task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8
>  (api:52)
>
> engine log:
>
> 2017-11-18 06:15:54,040-05 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
> (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Domain
> 'c65437ce-339f-4b01-aeb5-45c1d486bf49:nfs' was reported with error code
> '358'
> 2017-11-18 06:15:54,041-05 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand]
> (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Storage Domain
> 'nfs' of pool 'test-dc' is in problem in host 'lago-basic-suite-master-host-
> 1'
> 2017-11-18 06:15:54,045-05 ERROR [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-29)
> [4ce8aff3] EVENT_ID: VDS_STORAGE_VDS_STATS_FAILED(189), Host
> lago-basic-suite-master-host-1
>  reports about one of the Active Storage Domains as Problematic.
>
>
> *</error>*
>
>
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20171120/d146dbf4/attachment-0001.html>


More information about the Devel mailing list