<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 12, 2017 at 4:58 PM, Piotr Kliczewski <span dir="ltr">&lt;<a href="mailto:piotr.kliczewski@gmail.com" target="_blank">piotr.kliczewski@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Thu, Jan 12, 2017 at 8:52 AM, Roy Golan &lt;<a href="mailto:rgolan@redhat.com">rgolan@redhat.com</a>&gt; wrote:<br>
&gt;<br>
&gt;<br>
&gt; On 11 January 2017 at 21:50, Piotr Kliczewski &lt;<a href="mailto:piotr.kliczewski@gmail.com">piotr.kliczewski@gmail.com</a>&gt;<br>
&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; On Wed, Jan 11, 2017 at 9:47 AM, Daniel Belenky &lt;<a href="mailto:dbelenky@redhat.com">dbelenky@redhat.com</a>&gt;<br>
&gt;&gt; wrote:<br>
&gt;&gt; &gt; Hi all,<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; The following job: test-repo_ovirt_experimental_<wbr>master fails to pass the<br>
&gt;&gt; &gt; basic_suite.<br>
&gt;&gt; &gt; The job was triggered by this merge: <a href="https://gerrit.ovirt.org/#/c/69936/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/<wbr>69936/</a><br>
&gt;&gt; &gt; to<br>
&gt;&gt; &gt; vdsm project.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; The error I suspect cause this issue:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 2017-01-11 03:32:26,061-05 DEBUG<br>
&gt;&gt; &gt; [org.ovirt.vdsm.jsonrpc.<wbr>client.internal.<wbr>ResponseWorker] (ResponseWorker)<br>
&gt;&gt; &gt; []<br>
&gt;&gt; &gt; Message received:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; {&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;error&quot;:{&quot;<wbr>code&quot;:&quot;<a href="http://192.168.201.2:990178830" rel="noreferrer" target="_blank">192.168.201.2:990178830</a><wbr>&quot;,&quot;message&quot;:&quot;Vds<br>
&gt;&gt; &gt; timeout occured&quot;},&quot;id&quot;:null}<br>
&gt;&gt; &gt; 2017-01-11 03:32:26,067-05 ERROR<br>
&gt;&gt; &gt; [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]<br>
&gt;&gt; &gt; (DefaultQuartzScheduler7) [57bc898] Correlation ID: null, Call Stack:<br>
&gt;&gt; &gt; null,<br>
&gt;&gt; &gt; Custom Event ID: -1, Message: VDSM command failed: Message timeout which<br>
&gt;&gt; &gt; can<br>
&gt;&gt; &gt; be caused by communication issues<br>
&gt;&gt; &gt; 2017-01-11 03:32:26,069-05 ERROR<br>
&gt;&gt; &gt; [org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>IrsBrokerCommand]<br>
&gt;&gt; &gt; (DefaultQuartzScheduler7) [57bc898] ERROR, GetStoragePoolInfoVDSCommand(<br>
&gt;&gt; &gt; GetStoragePoolInfoVDSCommandPa<wbr>rameters:{runAsync=&#39;true&#39;,<br>
&gt;&gt; &gt; storagePoolId=&#39;f92af272-934f-<wbr>4327-9db0-afe353e6f61c&#39;,<br>
&gt;&gt; &gt; ignoreFailoverLimit=&#39;true&#39;}), exception: VDSGenericException:<br>
&gt;&gt; &gt; VDSNetworkException: Message timeout which can be caused by<br>
&gt;&gt; &gt; communication<br>
&gt;&gt; &gt; issues, log id: 2f12b94a<br>
&gt;&gt; &gt; 2017-01-11 03:32:26,069-05 ERROR<br>
&gt;&gt; &gt; [org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>IrsBrokerCommand]<br>
&gt;&gt; &gt; (DefaultQuartzScheduler7) [57bc898] Exception:<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>VDSNetworkException:<br>
&gt;&gt; &gt; VDSGenericException: VDSNetworkException: Message timeout which can be<br>
&gt;&gt; &gt; caused by communication issues<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>BrokerCommandBase.<wbr>proceedProxyReturnValue(<wbr>BrokerCommandBase.java:188)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>GetStoragePoolInfoVDSCommand.<wbr>executeIrsBrokerCommand(<wbr>GetStoragePoolInfoVDSCommand.<wbr>java:32)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>IrsBrokerCommand.lambda$<wbr>executeVDSCommand$0(<wbr>IrsBrokerCommand.java:95)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>runInControlledConcurrency(<wbr>IrsProxy.java:262)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.<wbr>IrsBrokerCommand.<wbr>executeVDSCommand(<wbr>IrsBrokerCommand.java:92)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.VDSCommandBase.<wbr>executeCommand(VDSCommandBase.<wbr>java:73)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt; org.ovirt.engine.core.dal.<wbr>VdcCommandBase.execute(<wbr>VdcCommandBase.java:33)<br>
&gt;&gt; &gt; [dal.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>DefaultVdsCommandExecutor.<wbr>execute(<wbr>DefaultVdsCommandExecutor.<wbr>java:14)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.ResourceManager.<wbr>runVdsCommand(ResourceManager.<wbr>java:408)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>proceedStoragePoolStats(<wbr>IrsProxy.java:348)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>lambda$updatingTimerElapsed$0(<wbr>IrsProxy.java:246)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>runInControlledConcurrency(<wbr>IrsProxy.java:262)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.<wbr>vdsbroker.irsbroker.IrsProxy.<wbr>updatingTimerElapsed(IrsProxy.<wbr>java:227)<br>
&gt;&gt; &gt; [vdsbroker.jar:]<br>
&gt;&gt; &gt; at sun.reflect.<wbr>GeneratedMethodAccessor191.<wbr>invoke(Unknown Source)<br>
&gt;&gt; &gt; [:1.8.0_111]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; sun.reflect.<wbr>DelegatingMethodAccessorImpl.<wbr>invoke(<wbr>DelegatingMethodAccessorImpl.<wbr>java:43)<br>
&gt;&gt; &gt; [rt.jar:1.8.0_111]<br>
&gt;&gt; &gt; at java.lang.reflect.Method.<wbr>invoke(Method.java:498) [rt.jar:1.8.0_111]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.invokeMethod(<wbr>JobWrapper.java:77)<br>
&gt;&gt; &gt; [scheduler.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt; org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.execute(<wbr>JobWrapper.java:51)<br>
&gt;&gt; &gt; [scheduler.jar:]<br>
&gt;&gt; &gt; at org.quartz.core.JobRunShell.<wbr>run(JobRunShell.java:213) [quartz.jar:]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt; java.util.concurrent.<wbr>Executors$RunnableAdapter.<wbr>call(Executors.java:511)<br>
&gt;&gt; &gt; [rt.jar:1.8.0_111]<br>
&gt;&gt; &gt; at java.util.concurrent.<wbr>FutureTask.run(FutureTask.<wbr>java:266)<br>
&gt;&gt; &gt; [rt.jar:1.8.0_111]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; java.util.concurrent.<wbr>ThreadPoolExecutor.runWorker(<wbr>ThreadPoolExecutor.java:1142)<br>
&gt;&gt; &gt; [rt.jar:1.8.0_111]<br>
&gt;&gt; &gt; at<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; java.util.concurrent.<wbr>ThreadPoolExecutor$Worker.run(<wbr>ThreadPoolExecutor.java:617)<br>
&gt;&gt; &gt; [rt.jar:1.8.0_111]<br>
&gt;&gt; &gt; at java.lang.Thread.run(Thread.<wbr>java:745) [rt.jar:1.8.0_111]<br>
&gt;&gt; &gt;<br>
&gt;&gt;<br>
&gt;&gt; This exception occurs when a response do not arrive within specified<br>
&gt;&gt; period of time.<br>
&gt;&gt;<br>
&gt;&gt; Here we can have potential reason for it:<br>
&gt;&gt;<br>
&gt;&gt; 2017-01-11 03:23:17,944 ERROR (jsonrpc/5) [storage.TaskManager.Task]<br>
&gt;&gt; (Task=&#39;db7a84ba-d89c-4ac7-<wbr>ab7a-b9f409ea7365&#39;) Unexpected error<br>
&gt;&gt; (task:870)<br>
&gt;&gt; Traceback (most recent call last):<br>
&gt;&gt;   File &quot;/usr/share/vdsm/storage/task.<wbr>py&quot;, line 877, in _run<br>
&gt;&gt;     return fn(*args, **kargs)<br>
&gt;&gt;   File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/logUtils.py&quot;, line 50, in<br>
&gt;&gt; wrapper<br>
&gt;&gt;     res = f(*args, **kwargs)<br>
&gt;&gt;   File &quot;/usr/share/vdsm/storage/hsm.<wbr>py&quot;, line 3054, in getVolumeInfo<br>
&gt;&gt;     volUUID=volUUID).getInfo()<br>
&gt;&gt;   File &quot;/usr/share/vdsm/storage/sd.<wbr>py&quot;, line 748, in produceVolume<br>
&gt;&gt;     volUUID)<br>
&gt;&gt;   File &quot;/usr/share/vdsm/storage/<wbr>blockVolume.py&quot;, line 415, in __init__<br>
&gt;&gt;     manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)<br>
&gt;&gt;   File &quot;/usr/share/vdsm/storage/<wbr>blockVolume.py&quot;, line 69, in __init__<br>
&gt;&gt;     volUUID)<br>
&gt;&gt;   File &quot;/usr/share/vdsm/storage/<wbr>volume.py&quot;, line 84, in __init__<br>
&gt;&gt;     self.validate()<br>
&gt;&gt;   File &quot;/usr/share/vdsm/storage/<wbr>blockVolume.py&quot;, line 159, in validate<br>
&gt;&gt;     raise se.VolumeDoesNotExist(self.<wbr>volUUID)<br>
&gt;&gt; VolumeDoesNotExist: Volume does not exist:<br>
&gt;&gt; (u&#39;5a464296-ebea-4d1f-a299-<wbr>cec45e82f9f3&#39;,)<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; If that is the case then we should reply with an error on this api call and<br>
&gt; not ignore and let the time out trigger. Do you think this what happen?<br>
<br>
</div></div>Reply to what? Let me say how I see it.<br>
<br>
We send to vdsm a request which takes a bit of time to process. During<br>
processing<br>
there was an issue that we had to reset the connection which reset<br>
tracking and buffers.<br></blockquote><div><br></div><div>What issue?</div><div>I&#39;m concerned about those unexplained disconnections. MOM from VDSM and Engine from VDSM.</div><div>Y.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Once response arrive we have no information about it.<br>
<br>
When the connection is reset we let the command code know that the<br>
issue occurred<br>
so it is up to the code to handle the issue - retry or fail.<br>
<div class="HOEnZb"><div class="h5"><br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Attached is a zip file with all artifacts from Jenkins.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; The error I&#39;ve mentioned above is found in:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; exported-artifacts/test_logs/<wbr>basic-suite-master/<a href="http://post-004_basic_sanity.py/lago-basic-suite-master-engine/_var_log_ovirt-engine/engine.log" rel="noreferrer" target="_blank">post-004_<wbr>basic_sanity.py/lago-basic-<wbr>suite-master-engine/_var_log_<wbr>ovirt-engine/engine.log</a><br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Can some one advise?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Thanks,<br>
&gt;&gt; &gt; --<br>
&gt;&gt; &gt; Daniel Belenky<br>
&gt;&gt; &gt; RHV DevOps<br>
&gt;&gt; &gt; Red Hat Israel<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; ______________________________<wbr>_________________<br>
&gt;&gt; &gt; Devel mailing list<br>
&gt;&gt; &gt; <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
&gt;&gt; &gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
&gt;&gt; ______________________________<wbr>_________________<br>
&gt;&gt; Devel mailing list<br>
&gt;&gt; <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
&gt;<br>
&gt;<br>
______________________________<wbr>_________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
</div></div></blockquote></div><br></div></div>