[ovirt-devel] [ OST Failure Report ] [ oVirt master ] [ 20.4.17 ] [ add_secondary_storage_domains ]

Yaniv Kaul ykaul at redhat.com
Thu Apr 20 08:32:14 UTC 2017


No, that's not the issue.
I've seen it happening few times.

1. It always with the ISO domain (which we don't use anyway in o-s-t)
2. Apparently, only one host is asking for a mount:
 authenticated mount request from 192.168.201.4:713 for /exports/nfs/iso
(/exports/nfs/iso)

(/var/log/messages of the NFS server)

And indeed, you can see in[1] that host1 made the request and all is well
on it.

However, there are connection issues with host0 which cause a timeout to
connectStorageServer():
From[2]:

2017-04-19 18:58:58,465-04 DEBUG
[org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) []
Message received:
{"jsonrpc":"2.0","error":{"code":"lago-basic-suite-master-host0:192912448","message":"Vds
timeout occured"},"id":null}

2017-04-19 18:58:58,475-04 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-7-thread-37) [755b908a] EVENT_ID:
VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VDSM lago-basic-suite-master-host0
command ConnectStorageServerVDS failed: Message timeout which can be
caused by communication issues
2017-04-19 18:58:58,475-04 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
(org.ovirt.thread.pool-7-thread-37) [755b908a] Command
'org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand'
return value '
ServerConnectionStatusReturn:{status='Status [code=5022,
message=Message timeout which can be caused by communication issues]'}


I wonder why, but on /var/log/messages[3], I'm seeing:
Apr 19 18:56:58 lago-basic-suite-master-host0 journal: vdsm Executor WARN
Worker blocked: <Worker name=jsonrpc/3 running <Task <JsonRpcTask
{'params': {u'connectionParams': [{u'id':
u'4ca8fc84-d872-4a7f-907f-9445bda7b6d1', u'connection':
u'192.168.201.3:/exports/nfs/share1',
u'iqn': u'', u'user': u'', u'tpgt': u'1', u'protocol_version': u'4.2',
u'password': '********', u'port': u''}], u'storagepoolID':
u'00000000-0000-0000-0000-000000000000', u'domainType': 1}, 'jsonrpc':
'2.0', 'method': u'StoragePool.connectStorageServer', 'id':
u'057da9c2-1e67-4c2f-9511-7d9de250386b'} at 0x2f44110> timeout=60,
duration=60 at 0x2f44310> task#=9 at 0x2ac11d0>
...


3. Also, there is still the infamous unable to update response issues.

{"jsonrpc":"2.0","method":"Host.ping","params":{},"id":"7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0"}�
2017-04-19 18:54:27,843-04 DEBUG
[org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient]
(org.ovirt.thread.pool-7-thread-15) [62d198cc] Message sent: SEND
destination:jms.topic.vdsm_requests
content-length:94
ovirtCorrelationId:62d198cc
reply-to:jms.topic.vdsm_responses

<JsonRpcRequest id: "7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0", method:
Host.ping, params: {}>
2017-04-19 18:54:27,885-04 DEBUG
[org.ovirt.vdsm.jsonrpc.client.reactors.stomp.impl.Message]
(org.ovirt.thread.pool-7-thread-16) [1f9aac13] SEND
ovirtCorrelationId:1f9aac13
destination:jms.topic.vdsm_requests
reply-to:jms.topic.vdsm_responses
content-length:94

...

{"jsonrpc": "2.0", "id": "7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0",
"result": true}�
2017-04-19 18:54:32,132-04 DEBUG
[org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
(ResponseWorker) [] Message received: {"jsonrpc": "2.0", "id":
"7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0", "result": true}
2017-04-19 18:54:32,133-04 ERROR
[org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) [] Not
able to update response for "7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0"


Would be nice to understand why.



4. Lastly, MOM is not running. Why?


Please open a bug with the details from item #2 above.
Y.


[1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host1/_var_log/vdsm/supervdsm.log
[2]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
[3]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host0/_var_log/messages





On Thu, Apr 20, 2017 at 9:27 AM, Gil Shinar <gshinar at redhat.com> wrote:

> Test failed: add_secondary_storage_domains
> Link to suspected patches:
> Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_
> experimental_master/6403
> Link to all logs: http://jenkins.ovirt.org/job/test-repo_ovirt_
> experimental_master/6403/artifact/exported-artifacts/
> basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py
>
>
> Error seems to be:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *2017-04-19 18:58:58,774-0400 ERROR (jsonrpc/2) [storage.TaskManager.Task]
> (Task='8f9699ed-cc2f-434b-aa1e-b3c8ff30324a') Unexpected error
> (task:871)Traceback (most recent call last):  File
> "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run
>   return fn(*args, **kargs)  File
> "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper
> res = f(*args, **kwargs)  File "/usr/share/vdsm/storage/hsm.py", line 2709,
> in getStorageDomainInfo    dom = self.validateSdUUID(sdUUID)  File
> "/usr/share/vdsm/storage/hsm.py", line 298, in validateSdUUID    sdDom =
> sdCache.produce(sdUUID=sdUUID)  File "/usr/share/vdsm/storage/sdc.py", line
> 112, in produce    domain.getRealDomain()  File
> "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain    return
> self._cache._realProduce(self._sdUUID)  File
> "/usr/share/vdsm/storage/sdc.py", line 136, in _realProduce    domain =
> self._findDomain(sdUUID)  File "/usr/share/vdsm/storage/sdc.py", line 153,
> in _findDomain    return findMethod(sdUUID)  File
> "/usr/share/vdsm/storage/sdc.py", line 178, in _findUnfetchedDomain
> raise se.StorageDomainDoesNotExist(sdUUID)StorageDomainDoesNotExist:
> Storage domain does not exist:
> (u'ac3bbc93-26ba-4ea8-8e76-c5b761f01931',)2017-04-19 18:58:58,777-0400 INFO
>  (jsonrpc/2) [storage.TaskManager.Task]
> (Task='8f9699ed-cc2f-434b-aa1e-b3c8ff30324a') aborting: Task is aborted:
> 'Storage domain does not exist' - code 358 (task:1176)2017-04-19
> 18:58:58,777-0400 ERROR (jsonrpc/2) [storage.Dispatcher] {'status':
> {'message': "Storage domain does not exist:
> (u'ac3bbc93-26ba-4ea8-8e76-c5b761f01931',)", 'code': 358}} (dispatcher:78)*
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170420/5b996dc5/attachment.html>


More information about the Devel mailing list