[ovirt-devel] test-repo_ovirt_experimental_master fails on jsonrpc errors

Piotr Kliczewski piotr.kliczewski at gmail.com
Thu Jan 12 15:26:29 UTC 2017


On Thu, Jan 12, 2017 at 4:18 PM, Yaniv Kaul <ykaul at redhat.com> wrote:
>
>
> On Thu, Jan 12, 2017 at 4:58 PM, Piotr Kliczewski
> <piotr.kliczewski at gmail.com> wrote:
>>
>> On Thu, Jan 12, 2017 at 8:52 AM, Roy Golan <rgolan at redhat.com> wrote:
>> >
>> >
>> > On 11 January 2017 at 21:50, Piotr Kliczewski
>> > <piotr.kliczewski at gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Jan 11, 2017 at 9:47 AM, Daniel Belenky <dbelenky at redhat.com>
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> > The following job: test-repo_ovirt_experimental_master fails to pass
>> >> > the
>> >> > basic_suite.
>> >> > The job was triggered by this merge:
>> >> > https://gerrit.ovirt.org/#/c/69936/
>> >> > to
>> >> > vdsm project.
>> >> >
>> >> > The error I suspect cause this issue:
>> >> >
>> >> > 2017-01-11 03:32:26,061-05 DEBUG
>> >> > [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker]
>> >> > (ResponseWorker)
>> >> > []
>> >> > Message received:
>> >> >
>> >> >
>> >> > {"jsonrpc":"2.0","error":{"code":"192.168.201.2:990178830","message":"Vds
>> >> > timeout occured"},"id":null}
>> >> > 2017-01-11 03:32:26,067-05 ERROR
>> >> >
>> >> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >> > (DefaultQuartzScheduler7) [57bc898] Correlation ID: null, Call Stack:
>> >> > null,
>> >> > Custom Event ID: -1, Message: VDSM command failed: Message timeout
>> >> > which
>> >> > can
>> >> > be caused by communication issues
>> >> > 2017-01-11 03:32:26,069-05 ERROR
>> >> > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>> >> > (DefaultQuartzScheduler7) [57bc898] ERROR,
>> >> > GetStoragePoolInfoVDSCommand(
>> >> > GetStoragePoolInfoVDSCommandParameters:{runAsync='true',
>> >> > storagePoolId='f92af272-934f-4327-9db0-afe353e6f61c',
>> >> > ignoreFailoverLimit='true'}), exception: VDSGenericException:
>> >> > VDSNetworkException: Message timeout which can be caused by
>> >> > communication
>> >> > issues, log id: 2f12b94a
>> >> > 2017-01-11 03:32:26,069-05 ERROR
>> >> > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>> >> > (DefaultQuartzScheduler7) [57bc898] Exception:
>> >> > org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>> >> > VDSGenericException: VDSNetworkException: Message timeout which can
>> >> > be
>> >> > caused by communication issues
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand.executeIrsBrokerCommand(GetStoragePoolInfoVDSCommand.java:32)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.lambda$executeVDSCommand$0(IrsBrokerCommand.java:95)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.runInControlledConcurrency(IrsProxy.java:262)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.executeVDSCommand(IrsBrokerCommand.java:92)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> > org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
>> >> > [dal.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:408)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.proceedStoragePoolStats(IrsProxy.java:348)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.lambda$updatingTimerElapsed$0(IrsProxy.java:246)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.runInControlledConcurrency(IrsProxy.java:262)
>> >> > [vdsbroker.jar:]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy.updatingTimerElapsed(IrsProxy.java:227)
>> >> > [vdsbroker.jar:]
>> >> > at sun.reflect.GeneratedMethodAccessor191.invoke(Unknown Source)
>> >> > [:1.8.0_111]
>> >> > at
>> >> >
>> >> >
>> >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >> > [rt.jar:1.8.0_111]
>> >> > at java.lang.reflect.Method.invoke(Method.java:498)
>> >> > [rt.jar:1.8.0_111]
>> >> > at
>> >> >
>> >> >
>> >> > org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77)
>> >> > [scheduler.jar:]
>> >> > at
>> >> >
>> >> > org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51)
>> >> > [scheduler.jar:]
>> >> > at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
>> >> > [quartz.jar:]
>> >> > at
>> >> >
>> >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> >> > [rt.jar:1.8.0_111]
>> >> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> >> > [rt.jar:1.8.0_111]
>> >> > at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> >> > [rt.jar:1.8.0_111]
>> >> > at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> >> > [rt.jar:1.8.0_111]
>> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
>> >> >
>> >>
>> >> This exception occurs when a response do not arrive within specified
>> >> period of time.
>> >>
>> >> Here we can have potential reason for it:
>> >>
>> >> 2017-01-11 03:23:17,944 ERROR (jsonrpc/5) [storage.TaskManager.Task]
>> >> (Task='db7a84ba-d89c-4ac7-ab7a-b9f409ea7365') Unexpected error
>> >> (task:870)
>> >> Traceback (most recent call last):
>> >>   File "/usr/share/vdsm/storage/task.py", line 877, in _run
>> >>     return fn(*args, **kargs)
>> >>   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
>> >> wrapper
>> >>     res = f(*args, **kwargs)
>> >>   File "/usr/share/vdsm/storage/hsm.py", line 3054, in getVolumeInfo
>> >>     volUUID=volUUID).getInfo()
>> >>   File "/usr/share/vdsm/storage/sd.py", line 748, in produceVolume
>> >>     volUUID)
>> >>   File "/usr/share/vdsm/storage/blockVolume.py", line 415, in __init__
>> >>     manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
>> >>   File "/usr/share/vdsm/storage/blockVolume.py", line 69, in __init__
>> >>     volUUID)
>> >>   File "/usr/share/vdsm/storage/volume.py", line 84, in __init__
>> >>     self.validate()
>> >>   File "/usr/share/vdsm/storage/blockVolume.py", line 159, in validate
>> >>     raise se.VolumeDoesNotExist(self.volUUID)
>> >> VolumeDoesNotExist: Volume does not exist:
>> >> (u'5a464296-ebea-4d1f-a299-cec45e82f9f3',)
>> >
>> >
>> >
>> > If that is the case then we should reply with an error on this api call
>> > and
>> > not ignore and let the time out trigger. Do you think this what happen?
>>
>> Reply to what? Let me say how I see it.
>>
>> We send to vdsm a request which takes a bit of time to process. During
>> processing
>> there was an issue that we had to reset the connection which reset
>> tracking and buffers.
>
>
> What issue?
> I'm concerned about those unexplained disconnections. MOM from VDSM and
> Engine from VDSM.
> Y.
>

It was just an example. For me in this context can be 30 seconds
without interaction with
vdsm. If the higher level timeout occurs is the same.

For mom there is different behavior. As far as I remember mom is using
GC to close the
client which closes the connection. M2Crypto complains about it but I
haven't see those
when using standard ssl module.

>
>>
>> Once response arrive we have no information about it.
>>
>> When the connection is reset we let the command code know that the
>> issue occurred
>> so it is up to the code to handle the issue - retry or fail.
>>
>> >>
>> >>
>> >> >
>> >> > Attached is a zip file with all artifacts from Jenkins.
>> >> >
>> >> > The error I've mentioned above is found in:
>> >> >
>> >> >
>> >> >
>> >> > exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity.py/lago-basic-suite-master-engine/_var_log_ovirt-engine/engine.log
>> >> >
>> >> > Can some one advise?
>> >> >
>> >> > Thanks,
>> >> > --
>> >> > Daniel Belenky
>> >> > RHV DevOps
>> >> > Red Hat Israel
>> >> >
>> >> > _______________________________________________
>> >> > Devel mailing list
>> >> > Devel at ovirt.org
>> >> > http://lists.ovirt.org/mailman/listinfo/devel
>> >> _______________________________________________
>> >> Devel mailing list
>> >> Devel at ovirt.org
>> >> http://lists.ovirt.org/mailman/listinfo/devel
>> >
>> >
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>
>


More information about the Devel mailing list