בתאריך יום ה׳, 20 באפר׳ 2017, 13:51, מאת Yaniv Kaul <ykaul(a)redhat.com>:
On Thu, Apr 20, 2017 at 1:32 PM, Nir Soffer
<nsoffer(a)redhat.com> wrote:
>
>
> בתאריך יום ה׳, 20 באפר׳ 2017, 13:05, מאת Piotr Kliczewski <
> piotr.kliczewski(a)gmail.com>:
>
>> On Thu, Apr 20, 2017 at 11:49 AM, Yaniv Kaul <ykaul(a)redhat.com> wrote:
>> >
>> >
>> > On Thu, Apr 20, 2017 at 11:55 AM, Piotr Kliczewski
>> > <piotr.kliczewski(a)gmail.com> wrote:
>> >>
>> >> On Thu, Apr 20, 2017 at 10:32 AM, Yaniv Kaul <ykaul(a)redhat.com>
>> wrote:
>> >> > No, that's not the issue.
>> >> > I've seen it happening few times.
>> >> >
>> >> > 1. It always with the ISO domain (which we don't use anyway in
>> o-s-t)
>> >> > 2. Apparently, only one host is asking for a mount:
>> >> > authenticated mount request from 192.168.201.4:713 for
>> /exports/nfs/iso
>> >> > (/exports/nfs/iso)
>> >> >
>> >> > (/var/log/messages of the NFS server)
>> >> >
>> >> > And indeed, you can see in[1] that host1 made the request and all
is
>> >> > well on
>> >> > it.
>> >> >
>> >> > However, there are connection issues with host0 which cause a
>> timeout to
>> >> > connectStorageServer():
>> >> > From[2]:
>> >> >
>> >> > 2017-04-19 18:58:58,465-04 DEBUG
>> >> > [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
>> (ResponseWorker)
>> >> > []
>> >> > Message received:
>> >> >
>> >> >
>>
{"jsonrpc":"2.0","error":{"code":"lago-basic-suite-master-host0:192912448","message":"Vds
>> >> > timeout occured"},"id":null}
>> >> >
>> >> > 2017-04-19 18:58:58,475-04 ERROR
>> >> >
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >> > (org.ovirt.thread.pool-7-thread-37) [755b908a] EVENT_ID:
>> >> > VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call
>> Stack:
>> >> > null,
>> >> > Custom Event ID: -1, Message: VDSM lago-basic-suite-master-host0
>> command
>> >> > ConnectStorageServerVDS failed: Message timeout which can be
caused
>> by
>> >> > communication issues
>> >> > 2017-04-19 18:58:58,475-04 INFO
>> >> >
>> >> >
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
>> >> > (org.ovirt.thread.pool-7-thread-37) [755b908a] Command
>> >> >
>> >> >
>>
'org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand'
>> >> > return value '
>> >> > ServerConnectionStatusReturn:{status='Status [code=5022,
>> message=Message
>> >> > timeout which can be caused by communication issues]'}
>> >> >
>> >> >
>> >> > I wonder why, but on /var/log/messages[3], I'm seeing:
>> >> > Apr 19 18:56:58 lago-basic-suite-master-host0 journal: vdsm
Executor
>> >> > WARN
>> >> > Worker blocked: <Worker name=jsonrpc/3 running <Task
<JsonRpcTask
>> >> > {'params':
>> >> > {u'connectionParams': [{u'id':
>> u'4ca8fc84-d872-4a7f-907f-9445bda7b6d1',
>> >> > u'connection':
u'192.168.201.3:/exports/nfs/share1', u'iqn': u'',
>> >> > u'user':
>> >> > u'', u'tpgt': u'1',
u'protocol_version': u'4.2', u'password':
>> >> > '********',
>> >> > u'port': u''}], u'storagepoolID':
>> >> > u'00000000-0000-0000-0000-000000000000',
>> >> > u'domainType': 1}, 'jsonrpc': '2.0',
'method':
>> >> > u'StoragePool.connectStorageServer', 'id':
>> >> > u'057da9c2-1e67-4c2f-9511-7d9de250386b'} at 0x2f44110>
timeout=60,
>> >> > duration=60 at 0x2f44310> task#=9 at 0x2ac11d0>
>> >> > ...
>> >> >
>> >>
>> >> I see following sequence:
>> >>
>> >> The message is sent:
>> >>
>> >> 2017-04-19 18:55:58,020-04 DEBUG
>> >> [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient]
>> >> (org.ovirt.thread.pool-7-thread-37) [755b908a] Message sent: SEND
>> >> destination:jms.topic.vdsm_requests
>> >> content-length:381
>> >> ovirtCorrelationId:755b908a
>> >> reply-to:jms.topic.vdsm_responses
>> >>
>> >> <JsonRpcRequest id: "057da9c2-1e67-4c2f-9511-7d9de250386b",
method:
>> >> StoragePool.connectStorageServer, params:
>> >> {storagepoolID=00000000-0000-0000-0000-000000000000, domainType=1,
>> >> connectionParams=[{password=, protocol_version=4.2, port=, iqn=,
>> >> connection=192.168.201.3:/exports/nfs/share1,
>> >> id=4ca8fc84-d872-4a7f-907f-9445bda7b6d1, user=, tpgt=1}]}>
>> >>
>> >> There is no response for specified amount of time and we timeout:
>> >>
>> >> 2017-04-19 18:58:58,465-04 DEBUG
>> >> [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
>> >> (ResponseWorker) [] Message received:
>> >>
>> >>
>>
{"jsonrpc":"2.0","error":{"code":"lago-basic-suite-master-host0:192912448","message":"Vds
>> >> timeout occured"},"id":null}
>> >>
>> >> As Yaniv pointed here is why we never get the response:
>> >>
>> >> Apr 19 18:58:58 lago-basic-suite-master-host0 journal: vdsm Executor
>> >> WARN Worker blocked: <Worker name=jsonrpc/3 running <Task
<JsonRpcTask
>> >> {'params': {u'connectionParams': [{u'id':
>> >> u'4ca8fc84-d872-4a7f-907f-9445bda7b6d1', u'connection':
>> >> u'192.168.201.3:/exports/nfs/share1', u'iqn':
u'', u'user': u'',
>> >> u'tpgt': u'1', u'protocol_version':
u'4.2', u'password': '********',
>> >> u'port': u''}], u'storagepoolID':
>> >> u'00000000-0000-0000-0000-000000000000', u'domainType':
1}, 'jsonrpc':
>> >> '2.0', 'method':
u'StoragePool.connectStorageServer', 'id':
>> >> u'057da9c2-1e67-4c2f-9511-7d9de250386b'} at 0x2f44110>
timeout=60,
>> >> duration=180 at 0x2f44310> task#=9 at 0x2ac11d0>
>>
>
> This means the connection attempt was stuck for 180 seconds. Need to
> check if the mount was stuck, or maybe there is some issue in supervdsm
> running this.
>
> This is a new warning introduced lately, before a stuck worker was hidden.
>
> Do we check that the nfs server is up before we start the tests?
>
We do not check, but it is up - it's installed before anything else, and
happens to be on the Engine. It has several minutes to be up and ready.
Moreover, a command to the same NFS server, same params, only a different
mount point, succeeded, a second earlier:
18:55:56,300::supervdsm_server::92::SuperVdsm.ServerCallback::(wrapper) call mount with
(u'192.168.201.3:/exports/nfs/share2',
u'/rhev/data-center/mnt/192.168.201.3:_exports_nfs_share2') {'vfstype':
'nfs', 'mntOpts':
'soft,nosharecache,timeo=600,retrans=6,nfsvers=4,minorversion=1',
'timeout': None, 'cgroup': None}
...
18:55:56,338::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) return mount with
None
But:
18:55:58,076::supervdsm_server::92::SuperVdsm.ServerCallback::(wrapper) call mount with
(u'192.168.201.3:/exports/nfs/share1',
u'/rhev/data-center/mnt/192.168.201.3:_exports_nfs_share1') {'vfstype':
'nfs', 'mntOpts':
'soft,nosharecache,timeo=600,retrans=6,nfsvers=4,minorversion=2',
'timeout': None, 'cgroup': None}
Got stuck.
The stuck mount is using nfs 4.2, the successful one is 4.1.
See
https://bugzilla.redhat.com/show_bug.cgi?id=1443913
Y.
>
> >>
>> >> >
>> >> > 3. Also, there is still the infamous unable to update response
>> issues.
>> >> >
>> >>
>> >> When we see timeout on a call our default behavior is to reconnect
>> >> when we clean pending messages.
>> >> As a result when we reconnect and receive a response from the message
>> >> sent before disconnect
>> >> we say it is unknown to us.
>> >
>> >
>> > But the example I've given was earlier than the storage issue?
>>
>> The specific message that you refer to was a ping command but it
>> timeout (3 secs)
>> before it arrived and it was removed from tracking. When it finally
>> arrived it was not tracked anymore.
>>
>> We may want to increase the timeout to give it more time for arrival.
>>
>> > Y.
>> >
>> >>
>> >>
>> >> >
>> >> >
>>
{"jsonrpc":"2.0","method":"Host.ping","params":{},"id":"7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0"}�
>> >> > 2017-04-19 18:54:27,843-04 DEBUG
>> >> > [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient]
>> >> > (org.ovirt.thread.pool-7-thread-15) [62d198cc] Message sent: SEND
>> >> > destination:jms.topic.vdsm_requests
>> >> > content-length:94
>> >> > ovirtCorrelationId:62d198cc
>> >> > reply-to:jms.topic.vdsm_responses
>> >> >
>> >> > <JsonRpcRequest id:
"7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0", method:
>> >> > Host.ping, params: {}>
>> >> > 2017-04-19 18:54:27,885-04 DEBUG
>> >> > [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.impl.Message]
>> >> > (org.ovirt.thread.pool-7-thread-16) [1f9aac13] SEND
>> >> > ovirtCorrelationId:1f9aac13
>> >> > destination:jms.topic.vdsm_requests
>> >> > reply-to:jms.topic.vdsm_responses
>> >> > content-length:94
>> >> >
>> >> > ...
>> >> >
>> >> > {"jsonrpc": "2.0", "id":
"7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0",
>> >> > "result":
>> >> > true}�
>> >> > 2017-04-19 18:54:32,132-04 DEBUG
>> >> > [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
>> (ResponseWorker)
>> >> > []
>> >> > Message received: {"jsonrpc": "2.0",
"id":
>> >> > "7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0",
"result": true}
>> >> > 2017-04-19 18:54:32,133-04 ERROR
>> >> > [org.ovirt.vdsm.jsonrpc.client.JsonRpcClient] (ResponseWorker) []
>> Not
>> >> > able
>> >> > to update response for
"7cb6052f-c732-4f7c-bd2d-e48c2ae1f5e0"
>> >> >
>> >> >
>> >> > Would be nice to understand why.
>> >> >
>> >> >
>> >> >
>> >> > 4. Lastly, MOM is not running. Why?
>> >> >
>> >> >
>> >> > Please open a bug with the details from item #2 above.
>> >> > Y.
>> >> >
>> >> >
>> >> > [1]
>> >> >
>> >> >
>>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/art...
>> >> > [2]
>> >> >
>> >> >
>>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/art...
>> >> > [3]
>> >> >
>> >> >
>>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/art...
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Apr 20, 2017 at 9:27 AM, Gil Shinar
<gshinar(a)redhat.com>
>> wrote:
>> >> >>
>> >> >> Test failed: add_secondary_storage_domains
>> >> >> Link to suspected patches:
>> >> >> Link to Job:
>> >> >>
>>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403
>> >> >> Link to all logs:
>> >> >>
>> >> >>
>>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6403/art...
>> >> >>
>> >> >>
>> >> >> Error seems to be:
>> >> >> 2017-04-19 18:58:58,774-0400 ERROR (jsonrpc/2)
>> >> >> [storage.TaskManager.Task]
>> >> >> (Task='8f9699ed-cc2f-434b-aa1e-b3c8ff30324a')
Unexpected error
>> >> >> (task:871)
>> >> >> Traceback (most recent call last):
>> >> >> File
"/usr/lib/python2.7/site-packages/vdsm/storage/task.py",
>> line
>> >> >> 878,
>> >> >> in _run
>> >> >> return fn(*args, **kargs)
>> >> >> File
"/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line
>> 52, in
>> >> >> wrapper
>> >> >> res = f(*args, **kwargs)
>> >> >> File "/usr/share/vdsm/storage/hsm.py", line 2709,
in
>> >> >> getStorageDomainInfo
>> >> >> dom = self.validateSdUUID(sdUUID)
>> >> >> File "/usr/share/vdsm/storage/hsm.py", line 298,
in
>> validateSdUUID
>> >> >> sdDom = sdCache.produce(sdUUID=sdUUID)
>> >> >> File "/usr/share/vdsm/storage/sdc.py", line 112, in
produce
>> >> >> domain.getRealDomain()
>> >> >> File "/usr/share/vdsm/storage/sdc.py", line 53, in
getRealDomain
>> >> >> return self._cache._realProduce(self._sdUUID)
>> >> >> File "/usr/share/vdsm/storage/sdc.py", line 136, in
_realProduce
>> >> >> domain = self._findDomain(sdUUID)
>> >> >> File "/usr/share/vdsm/storage/sdc.py", line 153, in
_findDomain
>> >> >> return findMethod(sdUUID)
>> >> >> File "/usr/share/vdsm/storage/sdc.py", line 178,
in
>> >> >> _findUnfetchedDomain
>> >> >> raise se.StorageDomainDoesNotExist(sdUUID)
>> >> >> StorageDomainDoesNotExist: Storage domain does not exist:
>> >> >> (u'ac3bbc93-26ba-4ea8-8e76-c5b761f01931',)
>> >> >> 2017-04-19 18:58:58,777-0400 INFO (jsonrpc/2)
>> >> >> [storage.TaskManager.Task]
>> >> >> (Task='8f9699ed-cc2f-434b-aa1e-b3c8ff30324a') aborting:
Task is
>> >> >> aborted:
>> >> >> 'Storage domain does not exist' - code 358 (task:1176)
>> >> >> 2017-04-19 18:58:58,777-0400 ERROR (jsonrpc/2)
[storage.Dispatcher]
>> >> >> {'status': {'message': "Storage domain
does not exist:
>> >> >> (u'ac3bbc93-26ba-4ea8-8e76-c5b761f01931',)",
'code': 358}}
>> >> >> (dispatcher:78)
>> >> >>
>> >> >> _______________________________________________
>> >> >> Devel mailing list
>> >> >> Devel(a)ovirt.org
>> >> >>
http://lists.ovirt.org/mailman/listinfo/devel
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Devel mailing list
>> >> > Devel(a)ovirt.org
>> >> >
http://lists.ovirt.org/mailman/listinfo/devel
>> >
>> >
>> _______________________________________________
>> Devel mailing list
>> Devel(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/devel
>
>