<div dir="ltr"><div><div>Heartbeat issue caused channel being closed and set to null which caused a surprise for a reactor.<br></div>It seems that we may want to schedule closure of channel in this situation instead of calling it explicitly.<br><br></div>What was the cause of the heartbeat issues?<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 27, 2016 at 4:12 AM, Allon Mureinik <span dir="ltr">&lt;<a href="mailto:amureini@redhat.com" target="_blank">amureini@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">Seems like we cleared up the engine issues related to the recent injection changes.</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">I am now seeing stop errors, e.g.:</div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace"><pre style="color:rgb(0,0,0)">{&quot;jsonrpc&quot;: &quot;2.0&quot;, &quot;id&quot;: &quot;ea0c564f-bc17-4fc2-8f1b-<wbr>67c4d28257c6&quot;, &quot;result&quot;: {&quot;cpuStatistics&quot;: {&quot;1&quot;: {&quot;cpuUser&quot;: &quot;3.07&quot;, &quot;nodeIndex&quot;: 0, &quot;cpuSys&quot;: &quot;3.00&quot;, &quot;cpuIdle&quot;: &quot;93.93&quot;}, &quot;0&quot;: {&quot;cpuUser&quot;: &quot;1.67&quot;, &quot;nodeIndex&quot;: 0, &quot;cpuSys&quot;: &quot;2.07&quot;, &quot;cpuIdle&quot;: &quot;96.26&quot;}}, &quot;numaNodeMemFree&quot;: {&quot;0&quot;: {&quot;memPercent&quot;: 83, &quot;memFree&quot;: &quot;359&quot;}}, &quot;memShared&quot;: 0, &quot;thpState&quot;: &quot;always&quot;, &quot;ksmMergeAcrossNodes&quot;: true, &quot;vmCount&quot;: 0, &quot;memUsed&quot;: &quot;20&quot;, &quot;storageDomains&quot;: {&quot;b2bb3220-1eb3-426a-90c2-<wbr>5e236aefbe1a&quot;: {&quot;code&quot;: 0, &quot;actual&quot;: true, &quot;version&quot;: 0, &quot;acquired&quot;: true, &quot;delay&quot;: &quot;0.000840117&quot;, &quot;lastCheck&quot;: &quot;7.1&quot;, &quot;valid&quot;: true}, &quot;3130195a-73f9-4490-b554-<wbr>98a9205cead6&quot;: {&quot;code&quot;: 0, &quot;actual&quot;: true, &quot;version&quot;: 4, &quot;acquired&quot;: true, &quot;delay&quot;: &quot;0.00150771&quot;, &quot;lastCheck&quot;: &quot;7.5&quot;, &quot;valid&quot;: true}, &quot;1a9e202b-83b7-4bdc-9b0c-<wbr>e76b83676068&quot;: {&quot;code&quot;: 0, &quot;actual&quot;: true, &quot;version&quot;: 4, &quot;acquired&quot;: true, &quot;delay&quot;: &quot;0.000590956&quot;,
2016-10-26 21:51:09,878 DEBUG [org.ovirt.engine.core.utils.<wbr>timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [6d206bd1] Rescheduling DEFAULT.org.ovirt.engine.core.<wbr>bll.tasks.<wbr>CommandCallbacksPoller.<wbr>invokeCallbackMethods#-<wbr>9223372036854775783 as there is no unfired trigger.
2016-10-26 21:51:28,705 DEBUG [org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.ReactorClient] (SSL Stomp Reactor) [383dd6a0] Heartbeat exceeded. Closing channel
2016-10-26 21:51:28,708 DEBUG [org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.Reactor] (SSL Stomp Reactor) [383dd6a0] Internal server error: null: java.lang.NullPointerException
        at org.ovirt.vdsm.jsonrpc.client.<wbr>reactors.SSLClient.write(<wbr>SSLClient.java:102) [vdsm-jsonrpc-java-client.jar:<wbr>]
        at org.ovirt.vdsm.jsonrpc.client.<wbr>reactors.ReactorClient.<wbr>processOutgoing(ReactorClient.<wbr>java:245) [vdsm-jsonrpc-java-client.jar:<wbr>]
        at org.ovirt.vdsm.jsonrpc.client.<wbr>reactors.ReactorClient.<wbr>process(ReactorClient.java:<wbr>208) [vdsm-jsonrpc-java-client.jar:<wbr>]
        at org.ovirt.vdsm.jsonrpc.client.<wbr>reactors.SSLClient.process(<wbr>SSLClient.java:125) [vdsm-jsonrpc-java-client.jar:<wbr>]
        at org.ovirt.vdsm.jsonrpc.client.<wbr>reactors.Reactor.<wbr>processChannels(Reactor.java:<wbr>89) [vdsm-jsonrpc-java-client.jar:<wbr>]
        at org.ovirt.vdsm.jsonrpc.client.<wbr>reactors.Reactor.run(Reactor.<wbr>java:65) [vdsm-jsonrpc-java-client.jar:<wbr>]</pre><pre style="color:rgb(0,0,0)"><br></pre><pre style="color:rgb(0,0,0)">Piotr - any idea?</pre></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 26, 2016 at 9:34 PM, Nadav Goldin <span dir="ltr">&lt;<a href="mailto:ngoldin@redhat.com" target="_blank">ngoldin@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Its running now:<br>
<a href="http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc24-x86_64/1037/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/o<wbr>virt-engine_master_build-artif<wbr>acts-fc24-x86_64/1037/</a>,<br>
when it finishes it will trigger the deploy job which will trigger the<br>
experimental job. The build_artifacts job is triggered approximately<br>
every 2 hours(depends on the load).<br>
Inside the build-artifacts job you can see the last<br>
commit(<a href="https://gerrit.ovirt.org/65776" rel="noreferrer" target="_blank">https://gerrit.ovirt.or<wbr>g/65776</a>, which is already after your<br>
commit)<br>
<div class="m_4750592846160278338HOEnZb"><div class="m_4750592846160278338h5"><br>
<br>
On Wed, Oct 26, 2016 at 9:26 PM, Allon Mureinik &lt;<a href="mailto:amureini@redhat.com" target="_blank">amureini@redhat.com</a>&gt; wrote:<br>
&gt; @Infra - the last experimental job I saw was from ~17:30 local Israel time.<br>
&gt; Any idea why another one isn&#39;t being triggered (or am I just being daft)?<br>
&gt;<br>
&gt; On Wed, Oct 26, 2016 at 6:27 PM, Allon Mureinik &lt;<a href="mailto:amureini@redhat.com" target="_blank">amureini@redhat.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Yipes.<br>
&gt;&gt; [1] should fix that, waiting for the CI to run to merge.<br>
&gt;&gt;<br>
&gt;&gt; [1] <a href="https://gerrit.ovirt.org/#/c/65768/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/6<wbr>5768/</a><br>
&gt;&gt;<br>
&gt;&gt; On Wed, Oct 26, 2016 at 3:42 PM, Nadav Goldin &lt;<a href="mailto:ngoldin@redhat.com" target="_blank">ngoldin@redhat.com</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Unfortunately it is still failing, see[1], the repository used was<br>
&gt;&gt;&gt; built from commit [2]. If you want to check the logs same links<br>
&gt;&gt;&gt; apply(just replace build number 2759-&gt;2782)<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; [1] <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/2782" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/t<wbr>est-repo_ovirt_experimental_ma<wbr>ster/2782</a><br>
&gt;&gt;&gt; [2] <a href="https://gerrit.ovirt.org/#/c/65740/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/6<wbr>5740/</a><br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Wed, Oct 26, 2016 at 11:33 AM, Tal Nisan &lt;<a href="mailto:tnisan@redhat.com" target="_blank">tnisan@redhat.com</a>&gt; wrote:<br>
&gt;&gt;&gt; &gt; Should be fixed now on master as those patches were just merged:<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; <a href="https://gerrit.ovirt.org/#/c/65738/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/6<wbr>5738/</a> - &quot;Move InjectorRule to VdsBroker&quot;:<br>
&gt;&gt;&gt; &gt; aligns the InjectorRule&#39;s package with the Injector so it can<br>
&gt;&gt;&gt; &gt; essentially be<br>
&gt;&gt;&gt; &gt; used wherever the Injector is used [+12, -4]<br>
&gt;&gt;&gt; &gt; <a href="https://gerrit.ovirt.org/#/c/65739" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/6<wbr>5739</a> - &quot;InjectorRule: override injector<br>
&gt;&gt;&gt; &gt; anyway&quot;: Fixes a bug in InjectorRule where the Injector is only mocked<br>
&gt;&gt;&gt; &gt; if<br>
&gt;&gt;&gt; &gt; you call InjectorRule.bind [+6, -1]<br>
&gt;&gt;&gt; &gt; <a href="https://gerrit.ovirt.org/#/c/65740" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/6<wbr>5740</a> - &quot;core: InjectorRule for injecting<br>
&gt;&gt;&gt; &gt; members&quot;: Fixes a bug in InjectorRule to allow using<br>
&gt;&gt;&gt; &gt; Injector.injectMembers<br>
&gt;&gt;&gt; &gt; when using it [+18, -5]<br>
&gt;&gt;&gt; &gt; <a href="https://gerrit.ovirt.org/#/c/65725" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/6<wbr>5725</a> - &quot;core: Fix AuditLogging&quot;: The<br>
&gt;&gt;&gt; &gt; actual<br>
&gt;&gt;&gt; &gt; fix. Basically, goes over all the places that create an<br>
&gt;&gt;&gt; &gt; AuditLoggableBase<br>
&gt;&gt;&gt; &gt; that needs injecting and take care of it [+155, -160]<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; On Wed, Oct 26, 2016 at 10:04 AM, Nadav Goldin &lt;<a href="mailto:ngoldin@redhat.com" target="_blank">ngoldin@redhat.com</a>&gt;<br>
&gt;&gt;&gt; &gt; wrote:<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; Hi,<br>
&gt;&gt;&gt; &gt;&gt; We have new failure on OST from patches merged to master yesterday,<br>
&gt;&gt;&gt; &gt;&gt; the failure started after the merge of [1], but as there were quite a<br>
&gt;&gt;&gt; &gt;&gt; few patches merged quickly I can&#39;t make sure it is the one causing<br>
&gt;&gt;&gt; &gt;&gt; it(OST aren&#39;t ran per-patch).<br>
&gt;&gt;&gt; &gt;&gt; The test that fails is [2] when attempting to start the VM.<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; The error from the API side:<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; RequestError:<br>
&gt;&gt;&gt; &gt;&gt; status: 500<br>
&gt;&gt;&gt; &gt;&gt; reason: Internal Server Error<br>
&gt;&gt;&gt; &gt;&gt; detail: javax.ejb.EJBException: java.lang.NullPointerException<br>
&gt;&gt;&gt; &gt;&gt;     at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.jboss.as.ejb3.tx.CMTTxInte<wbr>rceptor.handleExceptionInNoTx(<wbr>CMTTxInterceptor.java:213)<br>
&gt;&gt;&gt; &gt;&gt;     at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.jboss.as.ejb3.tx.CMTTxInte<wbr>rceptor.invokeInNoTx(CMTTxInte<wbr>rceptor.java:265)<br>
&gt;&gt;&gt; &gt;&gt;     at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.jboss.as.ejb3.tx.CMTTxInte<wbr>rceptor.supports(CMTTxIntercep<wbr>tor.java:374)<br>
&gt;&gt;&gt; &gt;&gt;     at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.jboss.as.ejb3.tx.CMTTxInte<wbr>rceptor.processInvocation(CMTT<wbr>xInterceptor.java:243)<br>
&gt;&gt;&gt; &gt;&gt;     at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.jboss.invocation.Intercept<wbr>orContext.proceed(InterceptorC<wbr>ontext.java:340)<br>
&gt;&gt;&gt; &gt;&gt; ....<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; In the engine logs there are a few &#39;java.lang.NullPointerExceptio<wbr>n&#39;<br>
&gt;&gt;&gt; &gt;&gt; errors:<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; 2016-10-25 11:53:52,845 INFO<br>
&gt;&gt;&gt; &gt;&gt; [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gableBase]<br>
&gt;&gt;&gt; &gt;&gt; (org.ovirt.thread.pool-6-threa<wbr>d-2) [5e6a88be] Failed to get vds<br>
&gt;&gt;&gt; &gt;&gt; &#39;d60db21f-95f0-487b-9f17-44861<wbr>e2610a7&#39;, error: null<br>
&gt;&gt;&gt; &gt;&gt; 2016-10-25 11:53:52,864 DEBUG<br>
&gt;&gt;&gt; &gt;&gt; [org.ovirt.engine.core.utils.t<wbr>imer.FixedDelayJobListener]<br>
&gt;&gt;&gt; &gt;&gt; (DefaultQuartzScheduler5) [] Rescheduling<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; DEFAULT.org.ovirt.engine.core.<wbr>bll.tasks.AsyncTaskManager.tim<wbr>erElapsed#-9223372036854775787<br>
&gt;&gt;&gt; &gt;&gt; as there is no unfired trigger.<br>
&gt;&gt;&gt; &gt;&gt; ...<br>
&gt;&gt;&gt; &gt;&gt; 2016-10-25 11:53:52,845 DEBUG<br>
&gt;&gt;&gt; &gt;&gt; [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gableBase]<br>
&gt;&gt;&gt; &gt;&gt; (org.ovirt.thread.pool-6-threa<wbr>d-2) [5e6a88be] Exception:<br>
&gt;&gt;&gt; &gt;&gt; java.lang.NullPointerException<br>
&gt;&gt;&gt; &gt;&gt;         at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.ovirt.engine.core.dal.dbbr<wbr>oker.auditloghandling.AuditLog<wbr>ableBase.getVdsStatic(AuditLog<wbr>ableBase.java:633)<br>
&gt;&gt;&gt; &gt;&gt; [dal.jar:]<br>
&gt;&gt;&gt; &gt;&gt;         at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.ovirt.engine.core.dal.dbbr<wbr>oker.auditloghandling.AuditLog<wbr>ableBase.getVdsName(AuditLogab<wbr>leBase.java:504)<br>
&gt;&gt;&gt; &gt;&gt; [dal.jar:]<br>
&gt;&gt;&gt; &gt;&gt; ...<br>
&gt;&gt;&gt; &gt;&gt; 2016-10-25 11:53:52,837 INFO<br>
&gt;&gt;&gt; &gt;&gt; [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gableBase]<br>
&gt;&gt;&gt; &gt;&gt; (org.ovirt.thread.pool-6-threa<wbr>d-2) [5e6a88be] Failed to get vds<br>
&gt;&gt;&gt; &gt;&gt; &#39;d60db21f-95f0-487b-9f17-44861<wbr>e2610a7&#39;, error: null<br>
&gt;&gt;&gt; &gt;&gt; 2016-10-25 11:53:52,837 DEBUG<br>
&gt;&gt;&gt; &gt;&gt; [org.ovirt.engine.core.dal.dbb<wbr>roker.auditloghandling.AuditLo<wbr>gableBase]<br>
&gt;&gt;&gt; &gt;&gt; (org.ovirt.thread.pool-6-threa<wbr>d-2) [5e6a88be] Exception:<br>
&gt;&gt;&gt; &gt;&gt; java.lang.NullPointerException<br>
&gt;&gt;&gt; &gt;&gt;         at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.ovirt.engine.core.dal.dbbr<wbr>oker.auditloghandling.AuditLog<wbr>ableBase.getVdsStatic(AuditLog<wbr>ableBase.java:633)<br>
&gt;&gt;&gt; &gt;&gt; [dal.jar:]<br>
&gt;&gt;&gt; &gt;&gt;         at<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; org.ovirt.engine.core.dal.dbbr<wbr>oker.auditloghandling.AuditLog<wbr>ableBase.getVdsName(AuditLogab<wbr>leBase.java:504)<br>
&gt;&gt;&gt; &gt;&gt; [dal.jar:]<br>
&gt;&gt;&gt; &gt;&gt; ...<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; The full engine logs can be found here[3] and the entire test suite<br>
&gt;&gt;&gt; &gt;&gt; logs here[4].<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; Can anyone have a look?<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; Thanks,<br>
&gt;&gt;&gt; &gt;&gt; Nadav.<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; [1] <a href="https://gerrit.ovirt.org/#/c/65198/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/6<wbr>5198/</a><br>
&gt;&gt;&gt; &gt;&gt; [2]<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; <a href="https://github.com/oVirt/ovirt-system-tests/blob/master/basic_suite_master/test-scenarios/004_basic_sanity.py#L322" rel="noreferrer" target="_blank">https://github.com/oVirt/ovirt<wbr>-system-tests/blob/master/<wbr>basic_suite_master/test-<wbr>scenarios/004_basic_sanity.py#<wbr>L322</a><br>
&gt;&gt;&gt; &gt;&gt; [3]<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/2759/artifact/exported-artifacts/basic_suite_master.sh-fc24/exported-artifacts/test_logs/basic_suite_master/post-004_basic_sanity.py/*zip*/post-004_basic_sanity.py.zip" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/t<wbr>est-repo_ovirt_experimental_ma<wbr>ster/2759/artifact/exported-ar<wbr>tifacts/basic_suite_master.sh-<wbr>fc24/exported-artifacts/test_<wbr>logs/basic_suite_master/post-<wbr>004_basic_sanity.py/*zip*/<wbr>post-004_basic_sanity.py.zip</a><br>
&gt;&gt;&gt; &gt;&gt; [4]<br>
&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/2759/artifact/exported-artifacts/basic_suite_master.sh-fc24/exported-artifacts/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/t<wbr>est-repo_ovirt_experimental_ma<wbr>ster/2759/artifact/exported-ar<wbr>tifacts/basic_suite_master.sh-<wbr>fc24/exported-artifacts/</a><br>
&gt;&gt;&gt; &gt;&gt; ______________________________<wbr>_________________<br>
&gt;&gt;&gt; &gt;&gt; Devel mailing list<br>
&gt;&gt;&gt; &gt;&gt; <a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
&gt;&gt;&gt; &gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/devel</a><br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
</div></div></blockquote></div><br></div>
</blockquote></div><br></div>