<div dir="ltr"><div><div><div>Piotr,<br><br></div>How do you identify the connection between the 2 message, why do you think that the timeout is related to the first message you've mentioned?<br><br></div>Regards,<br></div>Yevgeny<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 15, 2017 at 12:40 PM, Piotr Kliczewski <span dir="ltr"><<a href="mailto:piotr.kliczewski@gmail.com" target="_blank">piotr.kliczewski@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I agree with Nir that the above issue is not related to the job failure.<br>
<br>
I see that there is a timeout for:<br>
<br>
2017-05-14 04:10:04,215-04 DEBUG<br>
[org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.stomp.<wbr>StompCommonClient]<br>
(org.ovirt.thread.pool-7-<wbr>thread-12) [] Message sent: SEND<br>
destination:jms.topic.vdsm_<wbr>requests<br>
content-length:377<br>
reply-to:jms.topic.vdsm_<wbr>responses<br>
<br>
<JsonRpcRequest id: "846f4741-5c7d-4396-8ed7-<wbr>ce995d426728", method:<br>
Host.setupNetworks, params: {networks={Migration_Net={<wbr>vlan=200,<br>
ipv6autoconf=false, nic=eth0, bridged=false, dhcpv6=false, mtu=1500,<br>
ipv6addr=2001:0db8:85a3:0000:<wbr>0000:574c:14ea:0a02/64, switch=legacy}},<br>
bondings={}, options={connectivityTimeout=<wbr>120,<br>
connectivityCheck=true}}><br>
<br>
<br>
and the timeout occurred at:<br>
<br>
2017-05-14 04:10:07,535-04 DEBUG<br>
[org.ovirt.vdsm.jsonrpc.<wbr>client.internal.<wbr>ResponseWorker]<br>
(ResponseWorker) [] Message received:<br>
{"jsonrpc":"2.0","error":{"<wbr>code":"lago-basic-suite-<wbr>master-host1:286640862","<wbr>message":"Vds<br>
timeout occured"},"id":null}<br>
<br>
I see in supervdsm log:<br>
<br>
MainProcess|jsonrpc/1::INFO::<wbr>2017-05-14<br>
04:10:05,566::<wbr>netconfpersistence::52::root::<wbr>(setNetwork) Adding<br>
network Migration_Net({u'ipv6autoconf'<wbr>: False, 'nameservers': [],<br>
u'nic': u'eth0', u'vlan': 200, u'mtu': 1500, u'switch': u'legacy',<br>
u'dhcpv6': False, u'bridged': False, u'ipv6addr':<br>
u'2001:0db8:85a3:0000:0000:<wbr>574c:14ea:0a02/64', 'defaultRoute': False,<br>
'bootproto': 'none'})<br>
...<br>
MainProcess|jsonrpc/1::<wbr>WARNING::2017-05-14<br>
04:10:05,580::ifcfg::247::<wbr>root::(_addSourceRoute) Invalid input for<br>
source routing: name=eth0.200, addr=None, netmask=None, gateway=None<br>
netlink/events::DEBUG::2017-<wbr>05-14<br>
04:10:05,581::concurrent::184:<wbr>:root::(run) START thread<br>
<Thread(netlink/events, started daemon 139829765465856)> (func=<bound<br>
method Monitor._scan of <vdsm.network.netlink.monitor.<wbr>Monitor object<br>
at 0x7f2c8c01ebd0>>, args=(), kwargs={})<br>
MainProcess|jsonrpc/1::<wbr>WARNING::2017-05-14<br>
04:10:05,581::ifcfg::894::<wbr>root::(_ignore_missing_device) eth0.200<br>
didn't exist (yet) and so IPv6 wasn't enabled.<br>
<br>
My question is why only 3 seconds to timeout setupNetworks.<br>
<br>
<br>
<br>
I found one more exception not related to the above issue which we should fix:<br>
<br>
2017-05-14 04:08:08,924-04 DEBUG<br>
[org.ovirt.engine.core.utils.<wbr>timer.SchedulerUtilQuartzImpl]<br>
(DefaultQuartzScheduler10) [58fd90c1] Exception:<br>
java.lang.reflect.<wbr>InvocationTargetException<br>
at sun.reflect.<wbr>GeneratedMethodAccessor178.<wbr>invoke(Unknown Source) [:1.8.0_131]<br>
at sun.reflect.<wbr>DelegatingMethodAccessorImpl.<wbr>invoke(<wbr>DelegatingMethodAccessorImpl.<wbr>java:43)<br>
[rt.jar:1.8.0_131]<br>
at java.lang.reflect.Method.<wbr>invoke(Method.java:498) [rt.jar:1.8.0_131]<br>
at org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.invokeMethod(<wbr>JobWrapper.java:83)<br>
[scheduler.jar:]<br>
at org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.execute(<wbr>JobWrapper.java:55)<br>
[scheduler.jar:]<br>
at org.quartz.core.JobRunShell.<wbr>run(JobRunShell.java:213) [quartz.jar:]<br>
at java.util.concurrent.<wbr>Executors$RunnableAdapter.<wbr>call(Executors.java:511)<br>
[rt.jar:1.8.0_131]<br>
at java.util.concurrent.<wbr>FutureTask.run(FutureTask.<wbr>java:266) [rt.jar:1.8.0_131]<br>
at java.util.concurrent.<wbr>ThreadPoolExecutor.runWorker(<wbr>ThreadPoolExecutor.java:1142)<br>
[rt.jar:1.8.0_131]<br>
at java.util.concurrent.<wbr>ThreadPoolExecutor$Worker.run(<wbr>ThreadPoolExecutor.java:617)<br>
[rt.jar:1.8.0_131]<br>
at java.lang.Thread.run(Thread.<wbr>java:748) [rt.jar:1.8.0_131]<br>
Caused by: java.lang.NullPointerException<br>
at org.ovirt.engine.core.<wbr>vdsbroker.monitoring.<wbr>VmStatsRefresher.lambda$<wbr>processDevices$0(<wbr>VmStatsRefresher.java:44)<br>
[vdsbroker.jar:]<br>
at java.util.stream.<wbr>ReferencePipeline$2$1.accept(<wbr>ReferencePipeline.java:174)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>ReferencePipeline$3$1.accept(<wbr>ReferencePipeline.java:193)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>ReferencePipeline$2$1.accept(<wbr>ReferencePipeline.java:175)<br>
[rt.jar:1.8.0_131]<br>
at java.util.ArrayList$<wbr>ArrayListSpliterator.<wbr>forEachRemaining(ArrayList.<wbr>java:1374)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>AbstractPipeline.copyInto(<wbr>AbstractPipeline.java:481)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>AbstractPipeline.<wbr>wrapAndCopyInto(<wbr>AbstractPipeline.java:471)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.ForEachOps$<wbr>ForEachOp.evaluateSequential(<wbr>ForEachOps.java:151)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.ForEachOps$<wbr>ForEachOp$OfRef.<wbr>evaluateSequential(ForEachOps.<wbr>java:174)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>AbstractPipeline.evaluate(<wbr>AbstractPipeline.java:234)<br>
[rt.jar:1.8.0_131]<br>
at java.util.stream.<wbr>ReferencePipeline.forEach(<wbr>ReferencePipeline.java:418)<br>
[rt.jar:1.8.0_131]<br>
at org.ovirt.engine.core.<wbr>vdsbroker.monitoring.<wbr>VmStatsRefresher.<wbr>processDevices(<wbr>VmStatsRefresher.java:46)<br>
[vdsbroker.jar:]<br>
at org.ovirt.engine.core.<wbr>vdsbroker.monitoring.<wbr>PollVmStatsRefresher.poll(<wbr>PollVmStatsRefresher.java:44)<br>
[vdsbroker.jar:]<br>
... 11 more<br>
<br>
Thanks,<br>
Piotr<br>
<div><div class="h5"><br>
<br>
On Sun, May 14, 2017 at 6:12 PM, Nir Soffer <<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>> wrote:<br>
> On Sun, May 14, 2017 at 5:14 PM Shlomo Ben David <<a href="mailto:sbendavi@redhat.com">sbendavi@redhat.com</a>><br>
> wrote:<br>
>><br>
>> Link to suspected patches: N/A<br>
>> Link to Job:<br>
>> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6665/</a><br>
>> Link to all logs:<br>
>> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/6665/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-006_migrations.py/</a><br>
>><br>
>> Error snippet from the log:<br>
><br>
><br>
>> Traceback (most recent call last):<br>
>><br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/storage/task.py"<wbr>, line 878,<br>
>> in _run<br>
>> return fn(*args, **kargs)<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/logUtils.py", line 52, in<br>
>> wrapper<br>
>> res = f(*args, **kwargs)<br>
>> File "/usr/share/vdsm/storage/hsm.<wbr>py", line 2132, in getAllTasksStatuses<br>
>> allTasksStatus = self._pool.<wbr>getAllTasksStatuses()<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/storage/<wbr>securable.py", line<br>
>> 77, in wrapper<br>
>> raise SecureError("Secured object is not in safe state")<br>
>> SecureError: Secured object is not in safe state<br>
><br>
><br>
> This means that getAllTasksStatuses (SPM verb) was called on a host<br>
> which is not the SPM. This is an error in the caller, not in vdsm.<br>
><br>
> We already have a bug abut this:<br>
> <a href="https://bugzilla.redhat.com/1426429" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>1426429</a><br>
><br>
> However this build logs do not contain enough info. These<br>
> patches adds more info that can help to identify the caller:<br>
> <a href="https://gerrit.ovirt.org/76771" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/76771</a><br>
><br>
> This error is not related to the real error in this build.<br>
><br>
> Nir<br>
><br>
</div></div>> ______________________________<wbr>_________________<br>
> Devel mailing list<br>
> <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
______________________________<wbr>_________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
</blockquote></div><br></div>