<div dir="ltr"><div><div><div>Piotr,<br><br></div>How do you identify the connection between the 2 message, why do you think that the timeout is related to the first message you&#39;ve mentioned?<br><br></div>Regards,<br></div>Yevgeny<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 15, 2017 at 12:40 PM, Piotr Kliczewski <span dir="ltr">&lt;<a href="mailto:piotr.kliczewski@gmail.com" target="_blank">piotr.kliczewski@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I agree with Nir that the above issue is not related to the job failure.<br>
<br>
I see that there is a timeout for:<br>
<br>
2017-05-14 04:10:04,215-04 DEBUG<br>
[org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.stomp.<wbr>StompCommonClient]<br>
(org.ovirt.thread.pool-7-<wbr>thread-12) [] Message sent: SEND<br>
destination:jms.topic.vdsm_<wbr>requests<br>
content-length:377<br>
reply-to:jms.topic.vdsm_<wbr>responses<br>
<br>
&lt;JsonRpcRequest id: &quot;846f4741-5c7d-4396-8ed7-<wbr>ce995d426728&quot;, method:<br>
Host.setupNetworks, params: {networks={Migration_Net={<wbr>vlan=200,<br>
ipv6autoconf=false, nic=eth0, bridged=false, dhcpv6=false, mtu=1500,<br>
ipv6addr=2001:0db8:85a3:0000:<wbr>0000:574c:14ea:0a02/64, switch=legacy}},<br>
bondings={}, options={connectivityTimeout=<wbr>120,<br>
connectivityCheck=true}}&gt;<br>
<br>
<br>
and the timeout occurred at:<br>
<br>
2017-05-14 04:10:07,535-04 DEBUG<br>
[org.ovirt.vdsm.jsonrpc.<wbr>client.internal.<wbr>ResponseWorker]<br>
(ResponseWorker) [] Message received:<br>
{&quot;jsonrpc&quot;:&quot;2.0&quot;,&quot;error&quot;:{&quot;<wbr>code&quot;:&quot;lago-basic-suite-<wbr>master-host1:286640862&quot;,&quot;<wbr>message&quot;:&quot;Vds<br>
timeout occured&quot;},&quot;id&quot;:null}<br>
<br>
I see in supervdsm log:<br>
<br>
MainProcess|jsonrpc/1::INFO::<wbr>2017-05-14<br>
04:10:05,566::<wbr>netconfpersistence::52::root::<wbr>(setNetwork) Adding<br>
network Migration_Net({u&#39;ipv6autoconf&#39;<wbr>: False, &#39;nameservers&#39;: [],<br>
u&#39;nic&#39;: u&#39;eth0&#39;, u&#39;vlan&#39;: 200, u&#39;mtu&#39;: 1500, u&#39;switch&#39;: u&#39;legacy&#39;,<br>
u&#39;dhcpv6&#39;: False, u&#39;bridged&#39;: False, u&#39;ipv6addr&#39;:<br>
u&#39;2001:0db8:85a3:0000:0000:<wbr>574c:14ea:0a02/64&#39;, &#39;defaultRoute&#39;: False,<br>
&#39;bootproto&#39;: &#39;none&#39;})<br>
...<br>
MainProcess|jsonrpc/1::<wbr>WARNING::2017-05-14<br>
04:10:05,580::ifcfg::247::<wbr>root::(_addSourceRoute) Invalid input for<br>
source routing: name=eth0.200, addr=None, netmask=None, gateway=None<br>
netlink/events::DEBUG::2017-<wbr>05-14<br>
04:10:05,581::concurrent::184:<wbr>:root::(run) START thread<br>
&lt;Thread(netlink/events, started daemon 139829765465856)&gt; (func=&lt;bound<br>
method Monitor._scan of &lt;vdsm.network.netlink.monitor.<wbr>Monitor object<br>
at 0x7f2c8c01ebd0&gt;&gt;, args=(), kwargs={})<br>
MainProcess|jsonrpc/1::<wbr>WARNING::2017-05-14<br>
04:10:05,581::ifcfg::894::<wbr>root::(_ignore_missing_device) eth0.200<br>
didn&#39;t exist (yet) and so IPv6 wasn&#39;t enabled.<br>
<br>
My question is why only 3 seconds to timeout setupNetworks.<br>
<br>
<br>
<br>
I found one more exception not related to the above issue which we should fix:<br>
<br>
2017-05-14 04:08:08,924-04 DEBUG<br>
[org.ovirt.engine.core.utils.<wbr>timer.SchedulerUtilQuartzImpl]<br>
(DefaultQuartzScheduler10) [58fd90c1] Exception:<br>
java.lang.reflect.<wbr>InvocationTargetException<br>
at sun.reflect.<wbr>GeneratedMethodAccessor178.<wbr>invoke(Unknown Source) [:1.8.0_131]<br>
at sun.reflect.<wbr>DelegatingMethodAccessorImpl.<wbr>invoke(<wbr>DelegatingMethodAccessorImpl.<wbr>java:43)<br>
[rt.jar:1.8.0_131]<br>
at java.lang.reflect.Method.<wbr>invoke(Method.java:498) [rt.jar:1.8.0_131]<br>
at org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.invokeMethod(<wbr>JobWrapper.java:83)<br>
[scheduler.jar:]<br>
at org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.execute(<wbr>JobWrapper.java:55)<br>
[scheduler.jar:]<br>
at org.quartz.core.JobRunShell.<wbr>run(JobRunShell.java:213) [quartz.jar:]<br>
at java.util.concurrent.<wbr>Executors$RunnableAdapter.<wbr>call(Executors.java:511)<br>
[rt.jar:1.8.0_131]<br>
at java.util.concurrent.<wbr>FutureTask.run(FutureTask.<wbr>java:266) [rt.jar:1.8.0_131]<br>
at java.util.concurrent.<wbr>ThreadPoolExecutor.runWorker(<wbr>ThreadPoolExecutor.java:1142)<br>
[rt.jar:1.8.0_131]<br>
at java.util.concurrent.<wbr>ThreadPoolExecutor$Worker.run(<wbr>ThreadPoolExecutor.java:617)<br>
[rt.jar:1.8.0_131]<br>
at java.lang.Thread.run(Thread.<wbr>java:748) [rt.jar:1.8.0_131]<br>
Caused by: java.lang.NullPointerException<br>
at org.ovirt.engine.core.<wbr>vdsbroker.monitoring.<wbr>VmStatsRefresher.lambda$<wbr>processDevices$0(<wbr>VmStatsRefresher.java:44)<br>
[vdsbroker.jar:]<br>
at java.util.stream.<wbr>ReferencePipeline$2$1.accept(<wbr>ReferencePipeline.java:174)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>ReferencePipeline$3$1.accept(<wbr>ReferencePipeline.java:193)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>ReferencePipeline$2$1.accept(<wbr>ReferencePipeline.java:175)<br>
[rt.jar:1.8.0_131]<br>
at java.util.ArrayList$<wbr>ArrayListSpliterator.<wbr>forEachRemaining(ArrayList.<wbr>java:1374)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>AbstractPipeline.copyInto(<wbr>AbstractPipeline.java:481)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>AbstractPipeline.<wbr>wrapAndCopyInto(<wbr>AbstractPipeline.java:471)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.ForEachOps$<wbr>ForEachOp.evaluateSequential(<wbr>ForEachOps.java:151)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.ForEachOps$<wbr>ForEachOp$OfRef.<wbr>evaluateSequential(ForEachOps.<wbr>java:174)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>AbstractPipeline.evaluate(<wbr>AbstractPipeline.java:234)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>ReferencePipeline.forEach(<wbr>ReferencePipeline.java:418)<br>
[rt.jar:1.8.0_131]<br>
at org.ovirt.engine.core.<wbr>vdsbroker.monitoring.<wbr>VmStatsRefresher.<wbr>processDevices(<wbr>VmStatsRefresher.java:46)<br>
[vdsbroker.jar:]<br>
at org.ovirt.engine.core.<wbr>vdsbroker.monitoring.<wbr>PollVmStatsRefresher.poll(<wbr>PollVmStatsRefresher.java:44)<br>
[vdsbroker.jar:]<br>
... 11 more<br>
<br>
Thanks,<br>
Piotr<br>
<div><div class="h5"><br>
<br>
On Sun, May 14, 2017 at 6:12 PM, Nir Soffer &lt;<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>&gt; wrote:<br>
&gt; On Sun, May 14, 2017 at 5:14 PM Shlomo Ben David &lt;<a href="mailto:sbendavi@redhat.com">sbendavi@redhat.com</a>&gt;<br>
&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Link to suspected patches: N/A<br>
&gt;&gt; Link to Job:<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6665/</a><br>
&gt;&gt; Link to all logs:<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6665/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-006_migrations.py/</a><br>
&gt;&gt;<br>
&gt;&gt; Error snippet from the log:<br>
&gt;<br>
&gt;<br>
&gt;&gt; Traceback (most recent call last):<br>
&gt;&gt;<br>
&gt;&gt;   File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/storage/task.py&quot;<wbr>, line 878,<br>
&gt;&gt; in _run<br>
&gt;&gt;     return fn(*args, **kargs)<br>
&gt;&gt;   File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/logUtils.py&quot;, line 52, in<br>
&gt;&gt; wrapper<br>
&gt;&gt;     res = f(*args, **kwargs)<br>
&gt;&gt;   File &quot;/usr/share/vdsm/storage/hsm.<wbr>py&quot;, line 2132, in getAllTasksStatuses<br>
&gt;&gt;     allTasksStatus = self._pool.<wbr>getAllTasksStatuses()<br>
&gt;&gt;   File &quot;/usr/lib/python2.7/site-<wbr>packages/vdsm/storage/<wbr>securable.py&quot;, line<br>
&gt;&gt; 77, in wrapper<br>
&gt;&gt;     raise SecureError(&quot;Secured object is not in safe state&quot;)<br>
&gt;&gt; SecureError: Secured object is not in safe state<br>
&gt;<br>
&gt;<br>
&gt; This means that getAllTasksStatuses (SPM verb) was called on a host<br>
&gt; which is not the SPM. This is an error in the caller, not in vdsm.<br>
&gt;<br>
&gt; We already have a bug abut this:<br>
&gt; <a href="https://bugzilla.redhat.com/1426429" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>1426429</a><br>
&gt;<br>
&gt; However this build logs do not contain enough info. These<br>
&gt; patches adds more info that can help to identify the caller:<br>
&gt; <a href="https://gerrit.ovirt.org/76771" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/76771</a><br>
&gt;<br>
&gt; This error is not related to the real error in this build.<br>
&gt;<br>
&gt; Nir<br>
&gt;<br>
</div></div>&gt; ______________________________<wbr>_________________<br>
&gt; Devel mailing list<br>
&gt; <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
______________________________<wbr>_________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
</blockquote></div><br></div>