<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 12, 2017 at 4:58 PM, Piotr Kliczewski <span dir="ltr"><<a href="mailto:piotr.kliczewski@gmail.com" target="_blank">piotr.kliczewski@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Thu, Jan 12, 2017 at 8:52 AM, Roy Golan <<a href="mailto:rgolan@redhat.com">rgolan@redhat.com</a>> wrote:<br>
><br>
><br>
> On 11 January 2017 at 21:50, Piotr Kliczewski <<a href="mailto:piotr.kliczewski@gmail.com">piotr.kliczewski@gmail.com</a>><br>
> wrote:<br>
>><br>
>> On Wed, Jan 11, 2017 at 9:47 AM, Daniel Belenky <<a href="mailto:dbelenky@redhat.com">dbelenky@redhat.com</a>><br>
>> wrote:<br>
>> > Hi all,<br>
>> ><br>
>> > The following job: test-repo_ovirt_experimental_<wbr>master fails to pass the<br>
>> > basic_suite.<br>
>> > The job was triggered by this merge: <a href="https://gerrit.ovirt.org/#/c/69936/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/<wbr>69936/</a><br>
>> > to<br>
>> > vdsm project.<br>
>> ><br>
>> > The error I suspect cause this issue:<br>
>> ><br>
>> > 2017-01-11 03:32:26,061-05 DEBUG<br>
>> > [org.ovirt.vdsm.jsonrpc.<wbr>client.internal.<wbr>ResponseWorker] (ResponseWorker)<br>
>> > []<br>
>> > Message received:<br>
>> ><br>
>> > {"jsonrpc":"2.0","error":{"<wbr>code":"<a href="http://192.168.201.2:990178830" rel="noreferrer" target="_blank">192.168.201.2:990178830</a><wbr>","message":"Vds<br>
>> > timeout occured"},"id":null}<br>
>> > 2017-01-11 03:32:26,067-05 ERROR<br>
>> > [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]<br>
>> > (DefaultQuartzScheduler7) [57bc898] Correlation ID: null, Call Stack:<br>
>> > null,<br>
>> > Custom Event ID: -1, Message: VDSM command failed: Message timeout which<br>
>> > can<br>
>> > be caused by communication issues<br>
>> > 2017-01-11 03:32:26,069-05 ERROR<br>
>> > [org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>IrsBrokerCommand]<br>
>> > (DefaultQuartzScheduler7) [57bc898] ERROR, GetStoragePoolInfoVDSCommand(<br>
>> > GetStoragePoolInfoVDSCommandPa<wbr>rameters:{runAsync='true',<br>
>> > storagePoolId='f92af272-934f-<wbr>4327-9db0-afe353e6f61c',<br>
>> > ignoreFailoverLimit='true'}), exception: VDSGenericException:<br>
>> > VDSNetworkException: Message timeout which can be caused by<br>
>> > communication<br>
>> > issues, log id: 2f12b94a<br>
>> > 2017-01-11 03:32:26,069-05 ERROR<br>
>> > [org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>IrsBrokerCommand]<br>
>> > (DefaultQuartzScheduler7) [57bc898] Exception:<br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>VDSNetworkException:<br>
>> > VDSGenericException: VDSNetworkException: Message timeout which can be<br>
>> > caused by communication issues<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>BrokerCommandBase.<wbr>proceedProxyReturnValue(<wbr>BrokerCommandBase.java:188)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>GetStoragePoolInfoVDSCommand.<wbr>executeIrsBrokerCommand(<wbr>GetStoragePoolInfoVDSCommand.<wbr>java:32)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>IrsBrokerCommand.lambda$<wbr>executeVDSCommand$0(<wbr>IrsBrokerCommand.java:95)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>runInControlledConcurrency(<wbr>IrsProxy.java:262)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>IrsBrokerCommand.<wbr>executeVDSCommand(<wbr>IrsBrokerCommand.java:92)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.VDSCommandBase.<wbr>executeCommand(VDSCommandBase.<wbr>java:73)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> > org.ovirt.engine.core.dal.<wbr>VdcCommandBase.execute(<wbr>VdcCommandBase.java:33)<br>
>> > [dal.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>DefaultVdsCommandExecutor.<wbr>execute(<wbr>DefaultVdsCommandExecutor.<wbr>java:14)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.ResourceManager.<wbr>runVdsCommand(ResourceManager.<wbr>java:408)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>proceedStoragePoolStats(<wbr>IrsProxy.java:348)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>lambda$updatingTimerElapsed$0(<wbr>IrsProxy.java:246)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>runInControlledConcurrency(<wbr>IrsProxy.java:262)<br>
>> > [vdsbroker.jar:]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>updatingTimerElapsed(IrsProxy.<wbr>java:227)<br>
>> > [vdsbroker.jar:]<br>
>> > at sun.reflect.<wbr>GeneratedMethodAccessor191.<wbr>invoke(Unknown Source)<br>
>> > [:1.8.0_111]<br>
>> > at<br>
>> ><br>
>> > sun.reflect.<wbr>DelegatingMethodAccessorImpl.<wbr>invoke(<wbr>DelegatingMethodAccessorImpl.<wbr>java:43)<br>
>> > [rt.jar:1.8.0_111]<br>
>> > at java.lang.reflect.Method.<wbr>invoke(Method.java:498) [rt.jar:1.8.0_111]<br>
>> > at<br>
>> ><br>
>> > org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.invokeMethod(<wbr>JobWrapper.java:77)<br>
>> > [scheduler.jar:]<br>
>> > at<br>
>> > org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.execute(<wbr>JobWrapper.java:51)<br>
>> > [scheduler.jar:]<br>
>> > at org.quartz.core.JobRunShell.<wbr>run(JobRunShell.java:213) [quartz.jar:]<br>
>> > at<br>
>> > java.util.concurrent.<wbr>Executors$RunnableAdapter.<wbr>call(Executors.java:511)<br>
>> > [rt.jar:1.8.0_111]<br>
>> > at java.util.concurrent.<wbr>FutureTask.run(FutureTask.<wbr>java:266)<br>
>> > [rt.jar:1.8.0_111]<br>
>> > at<br>
>> ><br>
>> > java.util.concurrent.<wbr>ThreadPoolExecutor.runWorker(<wbr>ThreadPoolExecutor.java:1142)<br>
>> > [rt.jar:1.8.0_111]<br>
>> > at<br>
>> ><br>
>> > java.util.concurrent.<wbr>ThreadPoolExecutor$Worker.run(<wbr>ThreadPoolExecutor.java:617)<br>
>> > [rt.jar:1.8.0_111]<br>
>> > at java.lang.Thread.run(Thread.<wbr>java:745) [rt.jar:1.8.0_111]<br>
>> ><br>
>><br>
>> This exception occurs when a response do not arrive within specified<br>
>> period of time.<br>
>><br>
>> Here we can have potential reason for it:<br>
>><br>
>> 2017-01-11 03:23:17,944 ERROR (jsonrpc/5) [storage.TaskManager.Task]<br>
>> (Task='db7a84ba-d89c-4ac7-<wbr>ab7a-b9f409ea7365') Unexpected error<br>
>> (task:870)<br>
>> Traceback (most recent call last):<br>
>> File "/usr/share/vdsm/storage/task.<wbr>py", line 877, in _run<br>
>> return fn(*args, **kargs)<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/logUtils.py", line 50, in<br>
>> wrapper<br>
>> res = f(*args, **kwargs)<br>
>> File "/usr/share/vdsm/storage/hsm.<wbr>py", line 3054, in getVolumeInfo<br>
>> volUUID=volUUID).getInfo()<br>
>> File "/usr/share/vdsm/storage/sd.<wbr>py", line 748, in produceVolume<br>
>> volUUID)<br>
>> File "/usr/share/vdsm/storage/<wbr>blockVolume.py", line 415, in __init__<br>
>> manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)<br>
>> File "/usr/share/vdsm/storage/<wbr>blockVolume.py", line 69, in __init__<br>
>> volUUID)<br>
>> File "/usr/share/vdsm/storage/<wbr>volume.py", line 84, in __init__<br>
>> self.validate()<br>
>> File "/usr/share/vdsm/storage/<wbr>blockVolume.py", line 159, in validate<br>
>> raise se.VolumeDoesNotExist(self.<wbr>volUUID)<br>
>> VolumeDoesNotExist: Volume does not exist:<br>
>> (u'5a464296-ebea-4d1f-a299-<wbr>cec45e82f9f3',)<br>
><br>
><br>
><br>
> If that is the case then we should reply with an error on this api call and<br>
> not ignore and let the time out trigger. Do you think this what happen?<br>
<br>
</div></div>Reply to what? Let me say how I see it.<br>
<br>
We send to vdsm a request which takes a bit of time to process. During<br>
processing<br>
there was an issue that we had to reset the connection which reset<br>
tracking and buffers.<br></blockquote><div><br></div><div>What issue?</div><div>I'm concerned about those unexplained disconnections. MOM from VDSM and Engine from VDSM.</div><div>Y.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Once response arrive we have no information about it.<br>
<br>
When the connection is reset we let the command code know that the<br>
issue occurred<br>
so it is up to the code to handle the issue - retry or fail.<br>
<div class="HOEnZb"><div class="h5"><br>
>><br>
>><br>
>> ><br>
>> > Attached is a zip file with all artifacts from Jenkins.<br>
>> ><br>
>> > The error I've mentioned above is found in:<br>
>> ><br>
>> ><br>
>> > exported-artifacts/test_logs/<wbr>basic-suite-master/<a href="http://post-004_basic_sanity.py/lago-basic-suite-master-engine/_var_log_ovirt-engine/engine.log" rel="noreferrer" target="_blank">post-004_<wbr>basic_sanity.py/lago-basic-<wbr>suite-master-engine/_var_log_<wbr>ovirt-engine/engine.log</a><br>
>> ><br>
>> > Can some one advise?<br>
>> ><br>
>> > Thanks,<br>
>> > --<br>
>> > Daniel Belenky<br>
>> > RHV DevOps<br>
>> > Red Hat Israel<br>
>> ><br>
>> > ______________________________<wbr>_________________<br>
>> > Devel mailing list<br>
>> > <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
>> > <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
>> ______________________________<wbr>_________________<br>
>> Devel mailing list<br>
>> <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
>> <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
><br>
><br>
______________________________<wbr>_________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
</div></div></blockquote></div><br></div></div>