[ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 20-11-1017 ] [ 002_bootstrap.verify_add_all_hosts ]

Dafna Ron dron at redhat.com
Mon Nov 20 13:51:00 UTC 2017


On 11/20/2017 01:16 PM, Yaniv Kaul wrote:
>
>
> On Mon, Nov 20, 2017 at 3:10 PM, Dafna Ron <dron at redhat.com
> <mailto:dron at redhat.com>> wrote:
>
>     Hi,
>
>     We had a failure in OST for test 002_bootstrap.verify_add_all_hosts.
>
>     From the logs I can see that vdsm on host0 was reporting that it
>     cannot find the physical volume but eventually the storage was
>     created and is reported as responsive.
>
>     However, Host1 is reported to became non-operational with storage
>     domain does not exist error and I think that there is a race.
>
>
> I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1514906 on this. 
>  
>
>     I think that we create the storage domain while host1 is being
>     installed and if the domain is not created and reported as
>     activated in time, host1 will become nonOperational.
>
>
> And based on the above description, this is exactly the issue I've
> described in the BZ.
> Y.
>  

+1 this seems exactly the same issue.

>     are we starting installation of host1 before host0 and storage are
>     active?
>
>     **
>
>     *Link to suspected patches: I do not think that the patch reported
>     is related to the error*
>
>     *
>
>     **
>
>     **
>
>     https://gerrit.ovirt.org/#/c/84133/
>     <https://gerrit.ovirt.org/#/c/84133/>**
>
>
>     Link to Job:
>
>     **
>
>     http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/
>     <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/>
>
>
>     Link to all logs:
>
>
>     *
>
>     *http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/
>     <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/>
>     *
>
>     *
>     *
>
>     *(Relevant) error snippet from the log: *
>
>     *
>
>     <error>
>
>
>     Lago log:
>     *
>
>     2017-11-18
>     11:15:25,472::log_utils.py::end_log_task::670::nose::INFO::  #
>     add_master_storage_domain: ESC[32mSuccessESC[0m (in 0:01:09)
>     2017-11-18
>     11:15:25,472::log_utils.py::start_log_task::655::nose::INFO::  #
>     add_secondary_storage_domains: ESC[0mESC[0m
>     2017-11-18
>     11:16:47,455::log_utils.py::end_log_task::670::nose::INFO::  #
>     add_secondary_storage_domains: ESC[32mSuccessESC[0m (in 0:01:21)
>     2017-11-18
>     11:16:47,456::log_utils.py::start_log_task::655::nose::INFO::  #
>     import_templates: ESC[0mESC[0m
>     2017-11-18
>     11:16:47,513::testlib.py::stopTest::198::nose::INFO::    *
>     SKIPPED: Exported domain generation not supported yet
>     2017-11-18
>     11:16:47,514::log_utils.py::end_log_task::670::nose::INFO::  #
>     import_templates: ESC[32mSuccessESC[0m (in 0:00:00)
>     2017-11-18
>     11:16:47,514::log_utils.py::start_log_task::655::nose::INFO::  #
>     verify_add_all_hosts: ESC[0mESC[0m
>     2017-11-18
>     11:16:47,719::testlib.py::assert_equals_within::227::ovirtlago.testlib::ERROR::   
>     * Unhandled exception in <function <lambda> at 0x2909230>
>     Traceback (most recent call last):
>       File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py",
>     line 219, in assert_equals_within
>         res = func()
>       File
>     "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
>     line 430, in <lambda>
>         lambda: _all_hosts_up(hosts_service, total_hosts)
>       File
>     "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
>     line 129, in _all_hosts_up
>         _check_problematic_hosts(hosts_service)
>       File
>     "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
>     line 149, in _check_problematic_hosts
>         raise RuntimeError(dump_hosts)
>     RuntimeError: 1 hosts failed installation:
>     lago-basic-suite-master-host-1: non_operational
>
>     2017-11-18
>     11:16:47,722::utils.py::wrapper::480::lago.utils::DEBUG::Looking
>     for a workdir
>     2017-11-18
>     11:16:47,722::workdir.py::resolve_workdir_path::361::lago.workdir::DEBUG::Checking
>     if /dev/shm/ost/deployment-basic-suite-master is a workdir
>     2017-11-18
>     11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::   
>     * Collect artifacts: ESC[0mESC[0m
>     2017-11-18
>     11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO::   
>     * Collect artifacts: ESC[0mESC[0m
>
>     vdsm host0:
>
>     2017-11-18 06:14:23,980-0500 INFO  (jsonrpc/0) [vdsm.api] START
>     getDeviceList(storageType=3,
>     guids=[u'360014059618895272774e97a2aaf5dd6'], checkStatus=False,
>     options={}) from=::ffff:192.168.201.4,45636,
>     flow_id=ed8310a1-a7af-4a67-b351-8ff
>     364766b8a, task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:46)
>     2017-11-18 06:14:24,353-0500 WARN  (jsonrpc/0) [storage.LVM] lvm
>     pvs failed: 5 [] ['  Failed to find physical volume
>     "/dev/mapper/360014059618895272774e97a2aaf5dd6".'] (lvm:322)
>     2017-11-18 06:14:24,353-0500 WARN  (jsonrpc/0) [storage.HSM] getPV
>     failed for guid: 360014059618895272774e97a2aaf5dd6 (hsm:1973)
>     Traceback (most recent call last):
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py",
>     line 1970, in _getDeviceList
>         pv = lvm.getPV(guid)
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py",
>     line 852, in getPV
>         raise se.InaccessiblePhysDev((pvName,))
>     InaccessiblePhysDev: Multipath cannot access physical device(s):
>     "devices=(u'360014059618895272774e97a2aaf5dd6',)"
>     2017-11-18 06:14:24,389-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH
>     getDeviceList return={'devList': [{'status': 'unknown',
>     'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev': '4.0',
>     'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', 'pathlist':
>     [{'initiatorname': u'default', 'connection': u'192.168.200.4',
>     'iqn': u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user':
>     u'username', 'password': '********', 'port': '3260'},
>     {'initiatorname': u'default', 'connection': u'192.168.201.4',
>     'iqn': u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user':
>     u'username', 'password': '********', 'port': '3260'}],
>     'logicalblocksize': '512', 'discard_max_bytes': 1073741824,
>     'pathstatus': [{'type': 'iSCSI', 'physdev': 'sda', 'capacity':
>     '21474836480', 'state': 'active', 'lun': '0'}, {'type': 'iSCSI',
>     'physdev': 'sdf', 'capacity': '21474836480', 'state': 'active',
>     'lun': '0'}], 'devtype': 'iSCSI', 'physicalblocksize': '512',
>     'pvUUID': '', 'serial':
>     'SLIO-ORG_lun0_bdev_96188952-7277-4e97-a2aa-f5dd6aad6fc2', 'GUID':
>     '360014059618895272774e97a2aaf5dd6', 'productID': 'lun0_bdev'}]}
>     from=::ffff:192.168.201.4,45636,
>     flow_id=ed8310a1-a7af-4a67-b351-8ff364766b8a,
>     task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:52)
>
>
>     2017-11-18 06:14:31,788-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH
>     getStorageDomainInfo return={'info': {'uuid':
>     'cc61e074-a3b6-4371-9185-66079a39f123', 'vgMetadataDevice':
>     '360014059618895272774e97a2aaf5dd6', 'vguuid': '7ifbmt-0elj-uWZZ-zS
>     LG-plA8-8hd3-JG298b', 'metadataDevice':
>     '360014059618895272774e97a2aaf5dd6', 'state': 'OK', 'version':
>     '4', 'role': 'Regular', 'type': 'ISCSI', 'class': 'Data', 'pool':
>     [], 'name': 'iscsi'}} from=::ffff:192.168.201.4,45636, flow_id=2c1876
>     99, task_id=c2080b61-d4a5-4bdb-9d75-f81580a8257a (api:
>
>     vdsm host1:
>
>     2017-11-18 06:16:34,315-0500 ERROR (monitor/c65437c)
>     [storage.Monitor] Setting up monitor for
>     c65437ce-339f-4b01-aeb5-45c1d486bf49 failed (monitor:329)
>     Traceback (most recent call last):
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
>     line 326, in _setupLoop
>         self._setupMonitor()
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
>     line 348, in _setupMonitor
>         self._produceDomain()
>       File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 177,
>     in wrapper
>         value = meth(self, *a, **kw)
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py",
>     line 366, in _produceDomain
>         self.domain = sdCache.produce(self.sdUUID)
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
>     line 110, in produce
>         domain.getRealDomain()
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
>     line 51, in getRealDomain
>         return self._cache._realProduce(self._sdUUID)
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
>     line 134, in _realProduce
>         domain = self._findDomain(sdUUID)
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
>     line 151, in _findDomain
>         return findMethod(sdUUID)
>       File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py",
>     line 176, in _findUnfetchedDomain
>         raise se.StorageDomainDoesNotExist(sdUUID)
>     StorageDomainDoesNotExist: Storage domain does not exist:
>     (u'c65437ce-339f-4b01-aeb5-45c1d486bf49',)
>     2017-11-18 06:16:40,377-0500 INFO  (jsonrpc/7) [api.host] START
>     getStats() from=::ffff:192.168.201.4,58722 (api:46)
>     2017-11-18 06:16:40,378-0500 INFO  (jsonrpc/7) [vdsm.api] START
>     repoStats(domains=()) from=::ffff:192.168.201.4,58722,
>     task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 (api:46)
>     2017-11-18 06:16:40,379-0500 INFO  (jsonrpc/7) [vdsm.api] FINISH
>     repoStats return={u'c65437ce-339f-4b01-aeb5-45c1d486bf49':
>     {'code': 358, 'actual': True, 'version': -1, 'acquired': False,
>     'delay': '0', 'lastCheck': '6.1', 'valid': False},
>      u'cc61e074-a3b6-4371-9185-66079a39f123': {'code': 0, 'actual':
>     True, 'version': 4, 'acquired': True, 'delay': '0.00103987',
>     'lastCheck': '6.5', 'valid': True}}
>     from=::ffff:192.168.201.4,58722,
>     task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8
>      (api:52)
>
>     engine log:
>
>     2017-11-18 06:15:54,040-05 ERROR
>     [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy]
>     (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Domain
>     'c65437ce-339f-4b01-aeb5-45c1d486bf49:nfs' was reported with error
>     code '358'
>     2017-11-18 06:15:54,041-05 ERROR
>     [org.ovirt.engine.core.bll.InitVdsOnUpCommand]
>     (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Storage
>     Domain 'nfs' of pool 'test-dc' is in problem in host
>     'lago-basic-suite-master-host-1'
>     2017-11-18 06:15:54,045-05 ERROR
>     [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>     (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] EVENT_ID:
>     VDS_STORAGE_VDS_STATS_FAILED(189), Host lago-basic-suite-master-host-1
>      reports about one of the Active Storage Domains as Problematic.**
>     **
>
>     *
>     *
>
>     *</error>*
>
>     **
>
>
>
>     _______________________________________________
>     Devel mailing list
>     Devel at ovirt.org <mailto:Devel at ovirt.org>
>     http://lists.ovirt.org/mailman/listinfo/devel
>     <http://lists.ovirt.org/mailman/listinfo/devel>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20171120/19299eeb/attachment-0001.html>


More information about the Devel mailing list