On Thu, Jan 12, 2017 at 4:58 PM, Piotr Kliczewski <piotr.kliczewski@gmail.com> wrote:
On Thu, Jan 12, 2017 at 8:52 AM, Roy Golan <rgolan@redhat.com> wrote:
>
>
> On 11 January 2017 at 21:50, Piotr Kliczewski <piotr.kliczewski@gmail.com>
> wrote:
>>
>> On Wed, Jan 11, 2017 at 9:47 AM, Daniel Belenky <dbelenky@redhat.com>
>> wrote:
>> > Hi all,
>> >
>> > The following job: test-repo_ovirt_experimental_master fails to pass the
>> > basic_suite.
>> > The job was triggered by this merge: https://gerrit.ovirt.org/#/c/69936/
>> > to
>> > vdsm project.
>> >
>> > The error I suspect cause this issue:
>> >
>> > 2017-01-11 03:32:26,061-05 DEBUG
>> > [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker)
>> > []
>> > Message received:
>> >
>> > {"jsonrpc":"2.0","error":{"code":"192.168.201.2:990178830","message":"Vds
>> > timeout occured"},"id":null}
>> > 2017-01-11 03:32:26,067-05 ERROR
>> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> > (DefaultQuartzScheduler7) [57bc898] Correlation ID: null, Call Stack:
>> > null,
>> > Custom Event ID: -1, Message: VDSM command failed: Message timeout which
>> > can
>> > be caused by communication issues
>> > 2017-01-11 03:32:26,069-05 ERROR
>> > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>> > (DefaultQuartzScheduler7) [57bc898] ERROR, GetStoragePoolInfoVDSCommand(
>> > GetStoragePoolInfoVDSCommandParameters:{runAsync='true',
>> > storagePoolId='f92af272-934f-4327-9db0-afe353e6f61c',
>> > ignoreFailoverLimit='true'}), exception: VDSGenericException:
>> > VDSNetworkException: Message timeout which can be caused by
>> > communication
>> > issues, log id: 2f12b94a
>> > 2017-01-11 03:32:26,069-05 ERROR
>> > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>> > (DefaultQuartzScheduler7) [57bc898] Exception:
>> > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>> > VDSGenericException: VDSNetworkException: Message timeout which can be
>> > caused by communication issues
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand.executeIrsBrokerCommand(GetStoragePoolInfoVDSCommand.java:32)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.lambda$executeVDSCommand$0(IrsBrokerCommand.java:95)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.runInControlledConcurrency(IrsProxy.java:262)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.executeVDSCommand(IrsBrokerCommand.java:92)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73)
>> > [vdsbroker.jar:]
>> > at
>> > org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
>> > [dal.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:408)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.proceedStoragePoolStats(IrsProxy.java:348)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.lambda$updatingTimerElapsed$0(IrsProxy.java:246)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.runInControlledConcurrency(IrsProxy.java:262)
>> > [vdsbroker.jar:]
>> > at
>> >
>> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.updatingTimerElapsed(IrsProxy.java:227)
>> > [vdsbroker.jar:]
>> > at sun.reflect.GeneratedMethodAccessor191.invoke(Unknown Source)
>> > [:1.8.0_111]
>> > at
>> >
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > [rt.jar:1.8.0_111]
>> > at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_111]
>> > at
>> >
>> > org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77)
>> > [scheduler.jar:]
>> > at
>> > org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51)
>> > [scheduler.jar:]
>> > at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
>> > at
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> > [rt.jar:1.8.0_111]
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > [rt.jar:1.8.0_111]
>> > at
>> >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > [rt.jar:1.8.0_111]
>> > at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > [rt.jar:1.8.0_111]
>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
>> >
>>
>> This exception occurs when a response do not arrive within specified
>> period of time.
>>
>> Here we can have potential reason for it:
>>
>> 2017-01-11 03:23:17,944 ERROR (jsonrpc/5) [storage.TaskManager.Task]
>> (Task='db7a84ba-d89c-4ac7-ab7a-b9f409ea7365') Unexpected error
>> (task:870)
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/task.py", line 877, in _run
>>     return fn(*args, **kargs)
>>   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
>> wrapper
>>     res = f(*args, **kwargs)
>>   File "/usr/share/vdsm/storage/hsm.py", line 3054, in getVolumeInfo
>>     volUUID=volUUID).getInfo()
>>   File "/usr/share/vdsm/storage/sd.py", line 748, in produceVolume
>>     volUUID)
>>   File "/usr/share/vdsm/storage/blockVolume.py", line 415, in __init__
>>     manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
>>   File "/usr/share/vdsm/storage/blockVolume.py", line 69, in __init__
>>     volUUID)
>>   File "/usr/share/vdsm/storage/volume.py", line 84, in __init__
>>     self.validate()
>>   File "/usr/share/vdsm/storage/blockVolume.py", line 159, in validate
>>     raise se.VolumeDoesNotExist(self.volUUID)
>> VolumeDoesNotExist: Volume does not exist:
>> (u'5a464296-ebea-4d1f-a299-cec45e82f9f3',)
>
>
>
> If that is the case then we should reply with an error on this api call and
> not ignore and let the time out trigger. Do you think this what happen?

Reply to what? Let me say how I see it.

We send to vdsm a request which takes a bit of time to process. During
processing
there was an issue that we had to reset the connection which reset
tracking and buffers.

What issue?
I'm concerned about those unexplained disconnections. MOM from VDSM and Engine from VDSM.
Y.

 
Once response arrive we have no information about it.

When the connection is reset we let the command code know that the
issue occurred
so it is up to the code to handle the issue - retry or fail.

>>
>>
>> >
>> > Attached is a zip file with all artifacts from Jenkins.
>> >
>> > The error I've mentioned above is found in:
>> >
>> >
>> > exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity.py/lago-basic-suite-master-engine/_var_log_ovirt-engine/engine.log
>> >
>> > Can some one advise?
>> >
>> > Thanks,
>> > --
>> > Daniel Belenky
>> > RHV DevOps
>> > Red Hat Israel
>> >
>> > _______________________________________________
>> > Devel mailing list
>> > Devel@ovirt.org
>> > http://lists.ovirt.org/mailman/listinfo/devel
>> _______________________________________________
>> Devel mailing list
>> Devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>
>
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel