[ OST Failure Report ] [ oVirt Master (vdsm) ] [ 01-02-2018 ] [ 002_bootstrap.verify_add_all_hosts ]
by Dafna Ron
Hi,
We failed cq test 002_bootstrap.verify_add_all_hosts for Master vdsm
project.
Looking at the log, vdsm cannot find master storage domain and engine puts
the host on non-operational state.
Although on the surface the patch seems to be related, the master storage
domain is iscsi whole the patch is related to gluster.
I do not think there is a connection between the patch and the failure but
can you please have a look to make sure?
*Link and headline of suspected patches:
https://gerrit.ovirt.org/#/c/69668/ <https://gerrit.ovirt.org/#/c/69668/> -
*
*gluster: Fix error when brick is on a btrfs subvolumeLink to
Job:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5180/
<http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5180/>Link
to all
logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5180/a...
<http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5180/artifact/>(Relevant)
error snippet from the log: <error>vdsm: 2018-02-01 03:13:49,211-0500 INFO
(jsonrpc/4) [vdsm.api] START createStorageDomain(storageType=3,
sdUUID=u'077add35-9171-45d5-b6de-79cc5a853c36', domainName=u'iscsi',
typeSpecificArg=u'IdW3HG-K1Af-e0d3-u2O3-rGle-8fk5-ACNk6C',
domClass=1, domVersion=u'4', options=None) from=::ffff:192.168.201.4,58530,
flow_id=22d4ffd8, task_id=2ce6dd52-3d28-4532-abbf-d78d52af6cda
(api:46)2018-02-01 03:14:40,223-0500 INFO (jsonrpc/7) [vdsm.api] START
connectStoragePool(spUUID=u'2570c0c9-f872-4e49-964a-ee533a79c3f2',
hostID=1, msdUUID=u'077add35-9171-45d5-b6de-79cc5a853c36', masterVersion=1,
domainsMap={u'077add35-9171-45d5-b6de-79cc5a853c36': u'active'},
options=None) from=::ffff:192.168.201.4,36310, flow_id=19e9aa89,
task_id=878419a0-c5ce-4e35-aed5-b27d56b2886e (api:46)2018-02-01
03:14:40,225-0500 INFO (jsonrpc/7) [storage.StoragePoolMemoryBackend] new
storage pool master version 1 and domains map
{u'077add35-9171-45d5-b6de-79cc5a853c36': u'Active'}
(spbackends:449)2018-02-01 03:14:40,225-0500 INFO (jsonrpc/7)
[storage.StoragePool] updating pool 2570c0c9-f872-4e49-964a-ee533a79c3f2
backend from type NoneType instance 0x7f45919e3f20 to type
StoragePoolMemoryBackend instance 0x45411b0 (sp:157)2018-02-01
03:14:40,226-0500 INFO (jsonrpc/7) [storage.StoragePool] Connect host #1
to the storage pool 2570c0c9-f872-4e49-964a-ee533a79c3f2 with master
domain: 077add35-9171-45d5-b6de-79cc5a853c36 (ver = 1) (sp:692)2018-02-01
03:14:40,462-0500 INFO (jsonrpc/7) [vdsm.api] FINISH connectStoragePool
error=Cannot find master domain:
u'spUUID=2570c0c9-f872-4e49-964a-ee533a79c3f2,
msdUUID=077add35-9171-45d5-b6de-79cc5a853c36'
from=::ffff:192.168.201.4,36310, flow_id=19e9aa89,
task_id=878419a0-c5ce-4e35-aed5-b27d56b2886e (api:50)2018-02-01
03:14:40,462-0500 ERROR (jsonrpc/7) [storage.TaskManager.Task]
(Task='878419a0-c5ce-4e35-aed5-b27d56b2886e') Unexpected error
(task:875)Traceback (most recent call last): File
"/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
_run return fn(*args, **kargs) File "<string>", line 2, in
connectStoragePool File
"/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
method ret = func(*args, **kwargs) File
"/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1032, in
connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1094, in
_connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 704, in
connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1275, in
__rebuild self.setMasterDomain(msdUUID, masterVersion) File
"/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1488, in
setMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID,
msdUUID)StoragePoolMasterNotFound: Cannot find master domain:
u'spUUID=2570c0c9-f872-4e49-964a-ee533a79c3f2,
msdUUID=077add35-9171-45d5-b6de-79cc5a853c36'2018-02-01 03:14:40,466-0500
INFO (jsonrpc/7) [storage.TaskManager.Task]
(Task='878419a0-c5ce-4e35-aed5-b27d56b2886e') aborting: Task is aborted:
"Cannot find master domain: u'spUUID=2570c0c9-f872-4e49-964a-ee533a79c3f2,
msdUUID=077add35-9171-45d5-b6de-79cc5a853c36'" - code 304
(task:1181)2018-02-01 03:14:40,467-0500 ERROR (jsonrpc/7)
[storage.Dispatcher] FINISH connectStoragePool error=Cannot find master
domain: u'spUUID=2570c0c9-f872-4e49-964a-ee533a79c3f2,
msdUUID=077add35-9171-45d5-b6de-79cc5a853c36' (dispatcher:82)2018-02-01
03:14:40,467-0500 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call
StoragePool.connect failed (error 304) in 0.25 seconds (__init__:573)*
*engine:*
2018-02-01 03:14:40,603-05 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-70) [ba52086] EVENT_ID:
VDS_SET_NONOPERATIONAL_DOMAIN(522), Host lago-basic-suite-mast
er-host-0 cannot access the Storage Domain(s) <UNKNOWN> attached to the
Data Center test-dc. Setting Host state to Non-Operational.
2018-02-01 03:14:40,608-05 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-70) [ba52086] EVENT_ID:
VDS_ALERT_FENCE_IS_NOT_CONFIGURED(9,000), Failed to verify Pow
er Management configuration for Host lago-basic-suite-master-host-0.
2018-02-01 03:14:40,610-05 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engineScheduled-Thread-70) [ba52086] EVENT_ID:
CONNECT_STORAGE_POOL_FAILED(995), Failed to connect Host lago-
basic-suite-master-host-0 to Storage Pool test-dc
*</error>*
6 years, 10 months
[oVirt Jenkins] ovirt-appliance_master_build-artifacts-el7-x86_64 - Build # 691 - Failure!
by jenkins@jenkins.phx.ovirt.org
------=_Part_445_1158474491.1517409089560
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Project: http://jenkins.ovirt.org/job/ovirt-appliance_master_build-artifacts-el7-x...
Build: http://jenkins.ovirt.org/job/ovirt-appliance_master_build-artifacts-el7-x...
Build Number: 691
Build Status: Failure
Triggered By: Started by timer
-------------------------------------
Changes Since Last Success:
-------------------------------------
Changes for Build #691
[Dafna Ron] ovirt-setup-lib - dropping fc24
[Sandro Bonazzola] ovirt-setup-lib: re-align
[Dafna Ron] ovirt-iso-uploader - dropping fc24
[Sandro Bonazzola] ovirt-iso-uploader: re-align
[Dafna Ron] ovirt-guest-agent - dropping fc24
[Dafna Ron] ovirt-host-deploy - dropping fc24
[Sandro Bonazzola] ovirt-host-deploy: re-align
[Dafna Ron] ovirt-engine-wildfly - dropping fc24
[Sandro Bonazzola] ovirt-engine-wildfly: re-align
[Evgheni Dereveanchin] Update mirror snapshot for FC27 to fix gdbm
[Dafna Ron] ovirt-engine-wildfly-overlay - dropping fc24
[Yuval Turgeman] appliance-report: try to mount /var
-----------------
Failed Tests:
-----------------
No tests ran.
------=_Part_445_1158474491.1517409089560--
6 years, 10 months
[JIRA] (OVIRT-1872) Jenkins outage 01.02.2018
by Evgheni Dereveanchin (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1872?page=com.atlassian.jir... ]
Evgheni Dereveanchin reassigned OVIRT-1872:
-------------------------------------------
Assignee: Evgheni Dereveanchin (was: infra)
> Jenkins outage 01.02.2018
> -------------------------
>
> Key: OVIRT-1872
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1872
> Project: oVirt - virtualization made easy
> Issue Type: Outage
> Reporter: Evgheni Dereveanchin
> Assignee: Evgheni Dereveanchin
>
> Jenkins UI was slow and unreachable for several hours today. There were no major issues reported in the log yet CPU usage was very high. Jenkins was restarted to fix the issue, opening a ticket to investigate the cause and fix side effects if any
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100078)
6 years, 10 months
[JIRA] (OVIRT-1872) Jenkins outage 01.02.2018
by Evgheni Dereveanchin (oVirt JIRA)
This is a multi-part message in MIME format...
------------=_1517489276-29025-250
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Evgheni Dereveanchin created OVIRT-1872:
-------------------------------------------
Summary: Jenkins outage 01.02.2018
Key: OVIRT-1872
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1872
Project: oVirt - virtualization made easy
Issue Type: Outage
Reporter: Evgheni Dereveanchin
Assignee: infra
Jenkins UI was slow and unreachable for several hours today. There were no major issues reported in the log yet CPU usage was very high. Jenkins was restarted to fix the issue, opening a ticket to investigate the cause and fix side effects if any
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100078)
------------=_1517489276-29025-250
Content-Type: text/html; charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
<html><body>
<h3>Evgheni Dereveanchin created OVIRT-1872:</h3>
<pre> Summary: Jenkins outage 01.02.2018
Key: OVIRT-1872
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1872
Project: oVirt - virtualization made easy
Issue Type: Outage
Reporter: Evgheni Dereveanchin
Assignee: infra</pre>
<p>Jenkins UI was slow and unreachable for several hours today. There were no major issues reported in the log yet CPU usage was very high. Jenkins was restarted to fix the issue, opening a ticket to investigate the cause and fix side effects if any</p>
<p>— This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100078)</p>
<img src="https://u4043402.ct.sendgrid.net/wf/open?upn=i5TMWGV99amJbNxJpSp2-2BJ33BS..." alt="" width="1" height="1" border="0" style="height:1px !important;width:1px !important;border-width:0 !important;margin-top:0 !important;margin-bottom:0 !important;margin-right:0 !important;margin-left:0 !important;padding-top:0 !important;padding-bottom:0 !important;padding-right:0 !important;padding-left:0 !important;"/>
</body></html>
------------=_1517489276-29025-250--
6 years, 10 months