[ OST Failure Report ] [ oVirt master ] [ 14/05/2017 ] [ test-repo_ovirt_experimental_master ]

Hi, Test failed: [ test-repo_ovirt_experimental_master ] Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/ Link to all logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/artifa... Error snippet from the log: <error> 2017-05-14 03:56:31,782-0400 ERROR (jsonrpc/3) [storage.TaskManager.Task] (Task='661acb48-075b-494a-8fb4-64c04ed99bcb') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2172, in getAllTasksInfo allTasksInfo = self._pool.getAllTasksInfo() File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state 2017-05-14 03:56:31,785-0400 INFO (jsonrpc/3) [storage.TaskManager.Task] (Task='661acb48-075b-494a-8fb4-64c04ed99bcb') aborting: Task is aborted: u'Secured object is not in safe state' - code 100 (task:1176) 2017-05-14 03:56:31,786-0400 ERROR (jsonrpc/3) [storage.Dispatcher] Secured object is not in safe state (dispatcher:81) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 73, in wrapper result = ctask.prepare(func, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 106, in wrapper return m(self, *a, **kw) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1184, in prepare raise self.error SecureError: Secured object is not in safe state 2017-05-14 03:56:31,786-0400 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.getAllTasksInfo failed (error 654) in 0.00 seconds (__init__:577) 2017-05-14 03:56:31,796-0400 INFO (jsonrpc/5) [dispatcher] Run and protect: getAllTasksStatuses(spUUID=None, options=None) (logUtils:51) 2017-05-14 03:56:31,797-0400 ERROR (jsonrpc/5) [storage.TaskManager.Task] (Task='568f515e-365b-47e4-8b2f-24e15deb6ae9') Unexpected error (task:871) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2132, in getAllTasksStatuses allTasksStatus = self._pool.getAllTasksStatuses() File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state 2017-05-14 03:56:31,797-0400 INFO (jsonrpc/5) [storage.TaskManager.Task] (Task='568f515e-365b-47e4-8b2f-24e15deb6ae9') aborting: Task is aborted: u'Secured object is not in safe state' - code 100 (task:1176) 2017-05-14 03:56:31,798-0400 ERROR (jsonrpc/5) [storage.Dispatcher] Secured object is not in safe state (dispatcher:81) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 73, in wrapper result = ctask.prepare(func, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 106, in wrapper return m(self, *a, **kw) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1184, in prepare raise self.error SecureError: Secured object is not in safe state <error> Best Regards, Shlomi Ben-David | Software Engineer | Red Hat ISRAEL RHCSA | RHCVA | RHCE IRC: shlomibendavid (on #rhev-integ, #rhev-dev, #rhev-ci) OPEN SOURCE - 1 4 011 && 011 4 1

On Sun, May 14, 2017 at 5:14 PM Shlomo Ben David <sbendavi@redhat.com> wrote:
Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/ Link to all logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/artifa...
Error snippet from the log:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878,
in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2132, in getAllTasksStatuses allTasksStatus = self._pool.getAllTasksStatuses() File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state
This means that getAllTasksStatuses (SPM verb) was called on a host which is not the SPM. This is an error in the caller, not in vdsm. We already have a bug abut this: https://bugzilla.redhat.com/1426429 However this build logs do not contain enough info. These patches adds more info that can help to identify the caller: https://gerrit.ovirt.org/76771 This error is not related to the real error in this build. Nir

I agree with Nir that the above issue is not related to the job failure. I see that there is a timeout for: 2017-05-14 04:10:04,215-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient] (org.ovirt.thread.pool-7-thread-12) [] Message sent: SEND destination:jms.topic.vdsm_requests content-length:377 reply-to:jms.topic.vdsm_responses <JsonRpcRequest id: "846f4741-5c7d-4396-8ed7-ce995d426728", method: Host.setupNetworks, params: {networks={Migration_Net={vlan=200, ipv6autoconf=false, nic=eth0, bridged=false, dhcpv6=false, mtu=1500, ipv6addr=2001:0db8:85a3:0000:0000:574c:14ea:0a02/64, switch=legacy}}, bondings={}, options={connectivityTimeout=120, connectivityCheck=true}}> and the timeout occurred at: 2017-05-14 04:10:07,535-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Message received: {"jsonrpc":"2.0","error":{"code":"lago-basic-suite-master-host1:286640862","message":"Vds timeout occured"},"id":null} I see in supervdsm log: MainProcess|jsonrpc/1::INFO::2017-05-14 04:10:05,566::netconfpersistence::52::root::(setNetwork) Adding network Migration_Net({u'ipv6autoconf': False, 'nameservers': [], u'nic': u'eth0', u'vlan': 200, u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': False, u'ipv6addr': u'2001:0db8:85a3:0000:0000:574c:14ea:0a02/64', 'defaultRoute': False, 'bootproto': 'none'}) ... MainProcess|jsonrpc/1::WARNING::2017-05-14 04:10:05,580::ifcfg::247::root::(_addSourceRoute) Invalid input for source routing: name=eth0.200, addr=None, netmask=None, gateway=None netlink/events::DEBUG::2017-05-14 04:10:05,581::concurrent::184::root::(run) START thread <Thread(netlink/events, started daemon 139829765465856)> (func=<bound method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at 0x7f2c8c01ebd0>>, args=(), kwargs={}) MainProcess|jsonrpc/1::WARNING::2017-05-14 04:10:05,581::ifcfg::894::root::(_ignore_missing_device) eth0.200 didn't exist (yet) and so IPv6 wasn't enabled. My question is why only 3 seconds to timeout setupNetworks. I found one more exception not related to the above issue which we should fix: 2017-05-14 04:08:08,924-04 DEBUG [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler10) [58fd90c1] Exception: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor178.invoke(Unknown Source) [:1.8.0_131] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_131] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:83) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:55) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_131] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131] Caused by: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.monitoring.VmStatsRefresher.lambda$processDevices$0(VmStatsRefresher.java:44) [vdsbroker.jar:] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) [rt.jar:1.8.0_131] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) [rt.jar:1.8.0_131] at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) [rt.jar:1.8.0_131] at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) [rt.jar:1.8.0_131] at org.ovirt.engine.core.vdsbroker.monitoring.VmStatsRefresher.processDevices(VmStatsRefresher.java:46) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:44) [vdsbroker.jar:] ... 11 more Thanks, Piotr On Sun, May 14, 2017 at 6:12 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, May 14, 2017 at 5:14 PM Shlomo Ben David <sbendavi@redhat.com> wrote:
Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/ Link to all logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/artifa...
Error snippet from the log:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2132, in getAllTasksStatuses allTasksStatus = self._pool.getAllTasksStatuses() File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state
This means that getAllTasksStatuses (SPM verb) was called on a host which is not the SPM. This is an error in the caller, not in vdsm.
We already have a bug abut this: https://bugzilla.redhat.com/1426429
However this build logs do not contain enough info. These patches adds more info that can help to identify the caller: https://gerrit.ovirt.org/76771
This error is not related to the real error in this build.
Nir
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Mon, May 15, 2017 at 12:40 PM, Piotr Kliczewski < piotr.kliczewski@gmail.com> wrote:
I agree with Nir that the above issue is not related to the job failure.
I see that there is a timeout for:
2017-05-14 04:10:04,215-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient] (org.ovirt.thread.pool-7-thread-12) [] Message sent: SEND destination:jms.topic.vdsm_requests content-length:377 reply-to:jms.topic.vdsm_responses
<JsonRpcRequest id: "846f4741-5c7d-4396-8ed7-ce995d426728", method: Host.setupNetworks, params: {networks={Migration_Net={vlan=200, ipv6autoconf=false, nic=eth0, bridged=false, dhcpv6=false, mtu=1500, ipv6addr=2001:0db8:85a3:0000:0000:574c:14ea:0a02/64, switch=legacy}}, bondings={}, options={connectivityTimeout=120, connectivityCheck=true}}>
and the timeout occurred at:
2017-05-14 04:10:07,535-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Message received: {"jsonrpc":"2.0","error":{"code":"lago-basic-suite- master-host1:286640862","message":"Vds timeout occured"},"id":null}
I see in supervdsm log:
MainProcess|jsonrpc/1::INFO::2017-05-14 04:10:05,566::netconfpersistence::52::root::(setNetwork) Adding network Migration_Net({u'ipv6autoconf': False, 'nameservers': [], u'nic': u'eth0', u'vlan': 200, u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': False, u'ipv6addr': u'2001:0db8:85a3:0000:0000:574c:14ea:0a02/64', 'defaultRoute': False, 'bootproto': 'none'}) ... MainProcess|jsonrpc/1::WARNING::2017-05-14 04:10:05,580::ifcfg::247::root::(_addSourceRoute) Invalid input for source routing: name=eth0.200, addr=None, netmask=None, gateway=None netlink/events::DEBUG::2017-05-14 04:10:05,581::concurrent::184::root::(run) START thread <Thread(netlink/events, started daemon 139829765465856)> (func=<bound method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at 0x7f2c8c01ebd0>>, args=(), kwargs={}) MainProcess|jsonrpc/1::WARNING::2017-05-14 04:10:05,581::ifcfg::894::root::(_ignore_missing_device) eth0.200 didn't exist (yet) and so IPv6 wasn't enabled.
My question is why only 3 seconds to timeout setupNetworks.
I found one more exception not related to the above issue which we should fix:
Ack. This one is easy.
2017-05-14 04:08:08,924-04 DEBUG [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler10) [58fd90c1] Exception: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor178.invoke(Unknown Source) [:1.8.0_131] at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_131] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod( JobWrapper.java:83) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute( JobWrapper.java:55) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_131] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131] Caused by: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.monitoring.VmStatsRefresher.lambda$ processDevices$0(VmStatsRefresher.java:44) [vdsbroker.jar:] at java.util.stream.ReferencePipeline$2$1.accept( ReferencePipeline.java:174) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline$3$1.accept( ReferencePipeline.java:193) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline$2$1.accept( ReferencePipeline.java:175) [rt.jar:1.8.0_131] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList. java:1374) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.wrapAndCopyInto( AbstractPipeline.java:471) [rt.jar:1.8.0_131] at java.util.stream.ForEachOps$ForEachOp.evaluateSequential( ForEachOps.java:151) [rt.jar:1.8.0_131] at java.util.stream.ForEachOps$ForEachOp$OfRef. evaluateSequential(ForEachOps.java:174) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) [rt.jar:1.8.0_131] at org.ovirt.engine.core.vdsbroker.monitoring.VmStatsRefresher. processDevices(VmStatsRefresher.java:46) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll( PollVmStatsRefresher.java:44) [vdsbroker.jar:] ... 11 more
Thanks, Piotr
On Sun, May 14, 2017 at 5:14 PM Shlomo Ben David <sbendavi@redhat.com> wrote:
Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/ Link to all logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/6665/artifact/exported-artifacts/basic-suit-master- el7/test_logs/basic-suite-master/post-006_migrations.py/
Error snippet from the log:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2132, in getAllTasksStatuses allTasksStatus = self._pool.getAllTasksStatuses() File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
On Sun, May 14, 2017 at 6:12 PM, Nir Soffer <nsoffer@redhat.com> wrote: line
77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state
This means that getAllTasksStatuses (SPM verb) was called on a host which is not the SPM. This is an error in the caller, not in vdsm.
We already have a bug abut this: https://bugzilla.redhat.com/1426429
However this build logs do not contain enough info. These patches adds more info that can help to identify the caller: https://gerrit.ovirt.org/76771
This error is not related to the real error in this build.
Nir
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Piotr, How do you identify the connection between the 2 message, why do you think that the timeout is related to the first message you've mentioned? Regards, Yevgeny On Mon, May 15, 2017 at 12:40 PM, Piotr Kliczewski < piotr.kliczewski@gmail.com> wrote:
I agree with Nir that the above issue is not related to the job failure.
I see that there is a timeout for:
2017-05-14 04:10:04,215-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient] (org.ovirt.thread.pool-7-thread-12) [] Message sent: SEND destination:jms.topic.vdsm_requests content-length:377 reply-to:jms.topic.vdsm_responses
<JsonRpcRequest id: "846f4741-5c7d-4396-8ed7-ce995d426728", method: Host.setupNetworks, params: {networks={Migration_Net={vlan=200, ipv6autoconf=false, nic=eth0, bridged=false, dhcpv6=false, mtu=1500, ipv6addr=2001:0db8:85a3:0000:0000:574c:14ea:0a02/64, switch=legacy}}, bondings={}, options={connectivityTimeout=120, connectivityCheck=true}}>
and the timeout occurred at:
2017-05-14 04:10:07,535-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Message received: {"jsonrpc":"2.0","error":{"code":"lago-basic-suite- master-host1:286640862","message":"Vds timeout occured"},"id":null}
I see in supervdsm log:
MainProcess|jsonrpc/1::INFO::2017-05-14 04:10:05,566::netconfpersistence::52::root::(setNetwork) Adding network Migration_Net({u'ipv6autoconf': False, 'nameservers': [], u'nic': u'eth0', u'vlan': 200, u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': False, u'ipv6addr': u'2001:0db8:85a3:0000:0000:574c:14ea:0a02/64', 'defaultRoute': False, 'bootproto': 'none'}) ... MainProcess|jsonrpc/1::WARNING::2017-05-14 04:10:05,580::ifcfg::247::root::(_addSourceRoute) Invalid input for source routing: name=eth0.200, addr=None, netmask=None, gateway=None netlink/events::DEBUG::2017-05-14 04:10:05,581::concurrent::184::root::(run) START thread <Thread(netlink/events, started daemon 139829765465856)> (func=<bound method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at 0x7f2c8c01ebd0>>, args=(), kwargs={}) MainProcess|jsonrpc/1::WARNING::2017-05-14 04:10:05,581::ifcfg::894::root::(_ignore_missing_device) eth0.200 didn't exist (yet) and so IPv6 wasn't enabled.
My question is why only 3 seconds to timeout setupNetworks.
I found one more exception not related to the above issue which we should fix:
2017-05-14 04:08:08,924-04 DEBUG [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler10) [58fd90c1] Exception: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor178.invoke(Unknown Source) [:1.8.0_131] at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_131] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod( JobWrapper.java:83) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute( JobWrapper.java:55) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_131] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131] Caused by: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.monitoring.VmStatsRefresher.lambda$ processDevices$0(VmStatsRefresher.java:44) [vdsbroker.jar:] at java.util.stream.ReferencePipeline$2$1.accept( ReferencePipeline.java:174) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline$3$1.accept( ReferencePipeline.java:193) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline$2$1.accept( ReferencePipeline.java:175) [rt.jar:1.8.0_131] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList. java:1374) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.wrapAndCopyInto( AbstractPipeline.java:471) [rt.jar:1.8.0_131] at java.util.stream.ForEachOps$ForEachOp.evaluateSequential( ForEachOps.java:151) [rt.jar:1.8.0_131] at java.util.stream.ForEachOps$ForEachOp$OfRef. evaluateSequential(ForEachOps.java:174) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) [rt.jar:1.8.0_131] at org.ovirt.engine.core.vdsbroker.monitoring.VmStatsRefresher. processDevices(VmStatsRefresher.java:46) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll( PollVmStatsRefresher.java:44) [vdsbroker.jar:] ... 11 more
Thanks, Piotr
On Sun, May 14, 2017 at 5:14 PM Shlomo Ben David <sbendavi@redhat.com> wrote:
Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/ Link to all logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/6665/artifact/exported-artifacts/basic-suit-master- el7/test_logs/basic-suite-master/post-006_migrations.py/
Error snippet from the log:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2132, in getAllTasksStatuses allTasksStatus = self._pool.getAllTasksStatuses() File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
On Sun, May 14, 2017 at 6:12 PM, Nir Soffer <nsoffer@redhat.com> wrote: line
77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state
This means that getAllTasksStatuses (SPM verb) was called on a host which is not the SPM. This is an error in the caller, not in vdsm.
We already have a bug abut this: https://bugzilla.redhat.com/1426429
However this build logs do not contain enough info. These patches adds more info that can help to identify the caller: https://gerrit.ovirt.org/76771
This error is not related to the real error in this build.
Nir
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

Yevgeny, There is message id which matches request and response but in this case I made an assumption about setupNetworks being exclusive operation. The following setupNetwoks was the last one which was last executed. Thanks, Piotr On Thu, May 18, 2017 at 1:28 PM, Yevgeny Zaspitsky <yzaspits@redhat.com> wrote:
Piotr,
How do you identify the connection between the 2 message, why do you think that the timeout is related to the first message you've mentioned?
Regards, Yevgeny
On Mon, May 15, 2017 at 12:40 PM, Piotr Kliczewski <piotr.kliczewski@gmail.com> wrote:
I agree with Nir that the above issue is not related to the job failure.
I see that there is a timeout for:
2017-05-14 04:10:04,215-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompCommonClient] (org.ovirt.thread.pool-7-thread-12) [] Message sent: SEND destination:jms.topic.vdsm_requests content-length:377 reply-to:jms.topic.vdsm_responses
<JsonRpcRequest id: "846f4741-5c7d-4396-8ed7-ce995d426728", method: Host.setupNetworks, params: {networks={Migration_Net={vlan=200, ipv6autoconf=false, nic=eth0, bridged=false, dhcpv6=false, mtu=1500, ipv6addr=2001:0db8:85a3:0000:0000:574c:14ea:0a02/64, switch=legacy}}, bondings={}, options={connectivityTimeout=120, connectivityCheck=true}}>
and the timeout occurred at:
2017-05-14 04:10:07,535-04 DEBUG [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Message received:
{"jsonrpc":"2.0","error":{"code":"lago-basic-suite-master-host1:286640862","message":"Vds timeout occured"},"id":null}
I see in supervdsm log:
MainProcess|jsonrpc/1::INFO::2017-05-14 04:10:05,566::netconfpersistence::52::root::(setNetwork) Adding network Migration_Net({u'ipv6autoconf': False, 'nameservers': [], u'nic': u'eth0', u'vlan': 200, u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': False, u'ipv6addr': u'2001:0db8:85a3:0000:0000:574c:14ea:0a02/64', 'defaultRoute': False, 'bootproto': 'none'}) ... MainProcess|jsonrpc/1::WARNING::2017-05-14 04:10:05,580::ifcfg::247::root::(_addSourceRoute) Invalid input for source routing: name=eth0.200, addr=None, netmask=None, gateway=None netlink/events::DEBUG::2017-05-14 04:10:05,581::concurrent::184::root::(run) START thread <Thread(netlink/events, started daemon 139829765465856)> (func=<bound method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at 0x7f2c8c01ebd0>>, args=(), kwargs={}) MainProcess|jsonrpc/1::WARNING::2017-05-14 04:10:05,581::ifcfg::894::root::(_ignore_missing_device) eth0.200 didn't exist (yet) and so IPv6 wasn't enabled.
My question is why only 3 seconds to timeout setupNetworks.
I found one more exception not related to the above issue which we should fix:
2017-05-14 04:08:08,924-04 DEBUG [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler10) [58fd90c1] Exception: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor178.invoke(Unknown Source) [:1.8.0_131] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_131] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:83) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:55) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_131] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131] Caused by: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.monitoring.VmStatsRefresher.lambda$processDevices$0(VmStatsRefresher.java:44) [vdsbroker.jar:] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) [rt.jar:1.8.0_131] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) [rt.jar:1.8.0_131] at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) [rt.jar:1.8.0_131] at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) [rt.jar:1.8.0_131] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [rt.jar:1.8.0_131] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) [rt.jar:1.8.0_131] at org.ovirt.engine.core.vdsbroker.monitoring.VmStatsRefresher.processDevices(VmStatsRefresher.java:46) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:44) [vdsbroker.jar:] ... 11 more
Thanks, Piotr
On Sun, May 14, 2017 at 6:12 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, May 14, 2017 at 5:14 PM Shlomo Ben David <sbendavi@redhat.com> wrote:
Link to suspected patches: N/A Link to Job: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/ Link to all logs:
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/6665/artifa...
Error snippet from the log:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 878, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2132, in getAllTasksStatuses allTasksStatus = self._pool.getAllTasksStatuses() File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state
This means that getAllTasksStatuses (SPM verb) was called on a host which is not the SPM. This is an error in the caller, not in vdsm.
We already have a bug abut this: https://bugzilla.redhat.com/1426429
However this build logs do not contain enough info. These patches adds more info that can help to identify the caller: https://gerrit.ovirt.org/76771
This error is not related to the real error in this build.
Nir
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
participants (5)
-
Arik Hadas
-
Nir Soffer
-
Piotr Kliczewski
-
Shlomo Ben David
-
Yevgeny Zaspitsky