[ovirt-devel] Failure on start VM in ovirt-system-tests from patches merged to master on the 25/10/2016

Piotr Kliczewski pkliczew at redhat.com
Thu Oct 27 07:03:50 UTC 2016


Heartbeat issue caused channel being closed and set to null which caused a
surprise for a reactor.
It seems that we may want to schedule closure of channel in this situation
instead of calling it explicitly.

What was the cause of the heartbeat issues?

On Thu, Oct 27, 2016 at 4:12 AM, Allon Mureinik <amureini at redhat.com> wrote:

> Seems like we cleared up the engine issues related to the recent injection
> changes.
>
> I am now seeing stop errors, e.g.:
>
> {"jsonrpc": "2.0", "id": "ea0c564f-bc17-4fc2-8f1b-67c4d28257c6", "result": {"cpuStatistics": {"1": {"cpuUser": "3.07", "nodeIndex": 0, "cpuSys": "3.00", "cpuIdle": "93.93"}, "0": {"cpuUser": "1.67", "nodeIndex": 0, "cpuSys": "2.07", "cpuIdle": "96.26"}}, "numaNodeMemFree": {"0": {"memPercent": 83, "memFree": "359"}}, "memShared": 0, "thpState": "always", "ksmMergeAcrossNodes": true, "vmCount": 0, "memUsed": "20", "storageDomains": {"b2bb3220-1eb3-426a-90c2-5e236aefbe1a": {"code": 0, "actual": true, "version": 0, "acquired": true, "delay": "0.000840117", "lastCheck": "7.1", "valid": true}, "3130195a-73f9-4490-b554-98a9205cead6": {"code": 0, "actual": true, "version": 4, "acquired": true, "delay": "0.00150771", "lastCheck": "7.5", "valid": true}, "1a9e202b-83b7-4bdc-9b0c-e76b83676068": {"code": 0, "actual": true, "version": 4, "acquired": true, "delay": "0.000590956",
> 2016-10-26 21:51:09,878 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener] (DefaultQuartzScheduler7) [6d206bd1] Rescheduling DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods#-9223372036854775783 as there is no unfired trigger.
> 2016-10-26 21:51:28,705 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [383dd6a0] Heartbeat exceeded. Closing channel
> 2016-10-26 21:51:28,708 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [383dd6a0] Internal server error: null: java.lang.NullPointerException
> 	at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient.write(SSLClient.java:102) [vdsm-jsonrpc-java-client.jar:]
> 	at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.processOutgoing(ReactorClient.java:245) [vdsm-jsonrpc-java-client.jar:]
> 	at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.process(ReactorClient.java:208) [vdsm-jsonrpc-java-client.jar:]
> 	at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient.process(SSLClient.java:125) [vdsm-jsonrpc-java-client.jar:]
> 	at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.processChannels(Reactor.java:89) [vdsm-jsonrpc-java-client.jar:]
> 	at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.run(Reactor.java:65) [vdsm-jsonrpc-java-client.jar:]
>
>
> Piotr - any idea?
>
>
> On Wed, Oct 26, 2016 at 9:34 PM, Nadav Goldin <ngoldin at redhat.com> wrote:
>
>> Its running now:
>> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artif
>> acts-fc24-x86_64/1037/,
>> when it finishes it will trigger the deploy job which will trigger the
>> experimental job. The build_artifacts job is triggered approximately
>> every 2 hours(depends on the load).
>> Inside the build-artifacts job you can see the last
>> commit(https://gerrit.ovirt.org/65776, which is already after your
>> commit)
>>
>>
>> On Wed, Oct 26, 2016 at 9:26 PM, Allon Mureinik <amureini at redhat.com>
>> wrote:
>> > @Infra - the last experimental job I saw was from ~17:30 local Israel
>> time.
>> > Any idea why another one isn't being triggered (or am I just being
>> daft)?
>> >
>> > On Wed, Oct 26, 2016 at 6:27 PM, Allon Mureinik <amureini at redhat.com>
>> wrote:
>> >>
>> >> Yipes.
>> >> [1] should fix that, waiting for the CI to run to merge.
>> >>
>> >> [1] https://gerrit.ovirt.org/#/c/65768/
>> >>
>> >> On Wed, Oct 26, 2016 at 3:42 PM, Nadav Goldin <ngoldin at redhat.com>
>> wrote:
>> >>>
>> >>> Unfortunately it is still failing, see[1], the repository used was
>> >>> built from commit [2]. If you want to check the logs same links
>> >>> apply(just replace build number 2759->2782)
>> >>>
>> >>>
>> >>>
>> >>> [1] http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
>> ster/2782
>> >>> [2] https://gerrit.ovirt.org/#/c/65740/
>> >>>
>> >>> On Wed, Oct 26, 2016 at 11:33 AM, Tal Nisan <tnisan at redhat.com>
>> wrote:
>> >>> > Should be fixed now on master as those patches were just merged:
>> >>> >
>> >>> > https://gerrit.ovirt.org/#/c/65738/ - "Move InjectorRule to
>> VdsBroker":
>> >>> > aligns the InjectorRule's package with the Injector so it can
>> >>> > essentially be
>> >>> > used wherever the Injector is used [+12, -4]
>> >>> > https://gerrit.ovirt.org/#/c/65739 - "InjectorRule: override
>> injector
>> >>> > anyway": Fixes a bug in InjectorRule where the Injector is only
>> mocked
>> >>> > if
>> >>> > you call InjectorRule.bind [+6, -1]
>> >>> > https://gerrit.ovirt.org/#/c/65740 - "core: InjectorRule for
>> injecting
>> >>> > members": Fixes a bug in InjectorRule to allow using
>> >>> > Injector.injectMembers
>> >>> > when using it [+18, -5]
>> >>> > https://gerrit.ovirt.org/#/c/65725 - "core: Fix AuditLogging": The
>> >>> > actual
>> >>> > fix. Basically, goes over all the places that create an
>> >>> > AuditLoggableBase
>> >>> > that needs injecting and take care of it [+155, -160]
>> >>> >
>> >>> >
>> >>> > On Wed, Oct 26, 2016 at 10:04 AM, Nadav Goldin <ngoldin at redhat.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >> We have new failure on OST from patches merged to master yesterday,
>> >>> >> the failure started after the merge of [1], but as there were
>> quite a
>> >>> >> few patches merged quickly I can't make sure it is the one causing
>> >>> >> it(OST aren't ran per-patch).
>> >>> >> The test that fails is [2] when attempting to start the VM.
>> >>> >>
>> >>> >> The error from the API side:
>> >>> >>
>> >>> >> RequestError:
>> >>> >> status: 500
>> >>> >> reason: Internal Server Error
>> >>> >> detail: javax.ejb.EJBException: java.lang.NullPointerException
>> >>> >>     at
>> >>> >>
>> >>> >> org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(
>> CMTTxInterceptor.java:213)
>> >>> >>     at
>> >>> >>
>> >>> >> org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInte
>> rceptor.java:265)
>> >>> >>     at
>> >>> >>
>> >>> >> org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxIntercep
>> tor.java:374)
>> >>> >>     at
>> >>> >>
>> >>> >> org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTT
>> xInterceptor.java:243)
>> >>> >>     at
>> >>> >>
>> >>> >> org.jboss.invocation.InterceptorContext.proceed(InterceptorC
>> ontext.java:340)
>> >>> >> ....
>> >>> >>
>> >>> >> In the engine logs there are a few 'java.lang.NullPointerExceptio
>> n'
>> >>> >> errors:
>> >>> >>
>> >>> >> 2016-10-25 11:53:52,845 INFO
>> >>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
>> gableBase]
>> >>> >> (org.ovirt.thread.pool-6-thread-2) [5e6a88be] Failed to get vds
>> >>> >> 'd60db21f-95f0-487b-9f17-44861e2610a7', error: null
>> >>> >> 2016-10-25 11:53:52,864 DEBUG
>> >>> >> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
>> >>> >> (DefaultQuartzScheduler5) [] Rescheduling
>> >>> >>
>> >>> >>
>> >>> >> DEFAULT.org.ovirt.engine.core.bll.tasks.AsyncTaskManager.tim
>> erElapsed#-9223372036854775787
>> >>> >> as there is no unfired trigger.
>> >>> >> ...
>> >>> >> 2016-10-25 11:53:52,845 DEBUG
>> >>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
>> gableBase]
>> >>> >> (org.ovirt.thread.pool-6-thread-2) [5e6a88be] Exception:
>> >>> >> java.lang.NullPointerException
>> >>> >>         at
>> >>> >>
>> >>> >> org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLog
>> ableBase.getVdsStatic(AuditLogableBase.java:633)
>> >>> >> [dal.jar:]
>> >>> >>         at
>> >>> >>
>> >>> >> org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLog
>> ableBase.getVdsName(AuditLogableBase.java:504)
>> >>> >> [dal.jar:]
>> >>> >> ...
>> >>> >> 2016-10-25 11:53:52,837 INFO
>> >>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
>> gableBase]
>> >>> >> (org.ovirt.thread.pool-6-thread-2) [5e6a88be] Failed to get vds
>> >>> >> 'd60db21f-95f0-487b-9f17-44861e2610a7', error: null
>> >>> >> 2016-10-25 11:53:52,837 DEBUG
>> >>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
>> gableBase]
>> >>> >> (org.ovirt.thread.pool-6-thread-2) [5e6a88be] Exception:
>> >>> >> java.lang.NullPointerException
>> >>> >>         at
>> >>> >>
>> >>> >> org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLog
>> ableBase.getVdsStatic(AuditLogableBase.java:633)
>> >>> >> [dal.jar:]
>> >>> >>         at
>> >>> >>
>> >>> >> org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLog
>> ableBase.getVdsName(AuditLogableBase.java:504)
>> >>> >> [dal.jar:]
>> >>> >> ...
>> >>> >>
>> >>> >> The full engine logs can be found here[3] and the entire test suite
>> >>> >> logs here[4].
>> >>> >>
>> >>> >> Can anyone have a look?
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Nadav.
>> >>> >>
>> >>> >>
>> >>> >> [1] https://gerrit.ovirt.org/#/c/65198/
>> >>> >> [2]
>> >>> >>
>> >>> >> https://github.com/oVirt/ovirt-system-tests/blob/master/
>> basic_suite_master/test-scenarios/004_basic_sanity.py#L322
>> >>> >> [3]
>> >>> >>
>> >>> >> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
>> ster/2759/artifact/exported-artifacts/basic_suite_master.sh-
>> fc24/exported-artifacts/test_logs/basic_suite_master/post-
>> 004_basic_sanity.py/*zip*/post-004_basic_sanity.py.zip
>> >>> >> [4]
>> >>> >>
>> >>> >> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
>> ster/2759/artifact/exported-artifacts/basic_suite_master.sh-
>> fc24/exported-artifacts/
>> >>> >> _______________________________________________
>> >>> >> Devel mailing list
>> >>> >> Devel at ovirt.org
>> >>> >> http://lists.ovirt.org/mailman/listinfo/devel
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20161027/3dca5b94/attachment.html>


More information about the Infra mailing list