Heartbeat issue caused channel being closed and set to null which caused a
surprise for a reactor.
It seems that we may want to schedule closure of channel in this situation
instead of calling it explicitly.
What was the cause of the heartbeat issues?
On Thu, Oct 27, 2016 at 4:12 AM, Allon Mureinik <amureini(a)redhat.com> wrote:
Seems like we cleared up the engine issues related to the recent
injection
changes.
I am now seeing stop errors, e.g.:
{"jsonrpc": "2.0", "id":
"ea0c564f-bc17-4fc2-8f1b-67c4d28257c6", "result":
{"cpuStatistics": {"1": {"cpuUser": "3.07",
"nodeIndex": 0, "cpuSys": "3.00", "cpuIdle":
"93.93"}, "0": {"cpuUser": "1.67",
"nodeIndex": 0, "cpuSys": "2.07", "cpuIdle":
"96.26"}}, "numaNodeMemFree": {"0": {"memPercent":
83, "memFree": "359"}}, "memShared": 0,
"thpState": "always", "ksmMergeAcrossNodes": true,
"vmCount": 0, "memUsed": "20", "storageDomains":
{"b2bb3220-1eb3-426a-90c2-5e236aefbe1a": {"code": 0,
"actual": true, "version": 0, "acquired": true,
"delay": "0.000840117", "lastCheck": "7.1",
"valid": true}, "3130195a-73f9-4490-b554-98a9205cead6":
{"code": 0, "actual": true, "version": 4,
"acquired": true, "delay": "0.00150771",
"lastCheck": "7.5", "valid": true},
"1a9e202b-83b7-4bdc-9b0c-e76b83676068": {"code": 0,
"actual": true, "version": 4, "acquired": true,
"delay": "0.000590956",
2016-10-26 21:51:09,878 DEBUG [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
(DefaultQuartzScheduler7) [6d206bd1] Rescheduling
DEFAULT.org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods#-9223372036854775783
as there is no unfired trigger.
2016-10-26 21:51:28,705 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL
Stomp Reactor) [383dd6a0] Heartbeat exceeded. Closing channel
2016-10-26 21:51:28,708 DEBUG [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp
Reactor) [383dd6a0] Internal server error: null: java.lang.NullPointerException
at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient.write(SSLClient.java:102)
[vdsm-jsonrpc-java-client.jar:]
at
org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.processOutgoing(ReactorClient.java:245)
[vdsm-jsonrpc-java-client.jar:]
at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.process(ReactorClient.java:208)
[vdsm-jsonrpc-java-client.jar:]
at org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient.process(SSLClient.java:125)
[vdsm-jsonrpc-java-client.jar:]
at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.processChannels(Reactor.java:89)
[vdsm-jsonrpc-java-client.jar:]
at org.ovirt.vdsm.jsonrpc.client.reactors.Reactor.run(Reactor.java:65)
[vdsm-jsonrpc-java-client.jar:]
Piotr - any idea?
On Wed, Oct 26, 2016 at 9:34 PM, Nadav Goldin <ngoldin(a)redhat.com> wrote:
> Its running now:
>
http://jenkins.ovirt.org/job/ovirt-engine_master_build-artif
> acts-fc24-x86_64/1037/,
> when it finishes it will trigger the deploy job which will trigger the
> experimental job. The build_artifacts job is triggered approximately
> every 2 hours(depends on the load).
> Inside the build-artifacts job you can see the last
>
commit(https://gerrit.ovirt.org/65776, which is already after your
> commit)
>
>
> On Wed, Oct 26, 2016 at 9:26 PM, Allon Mureinik <amureini(a)redhat.com>
> wrote:
> > @Infra - the last experimental job I saw was from ~17:30 local Israel
> time.
> > Any idea why another one isn't being triggered (or am I just being
> daft)?
> >
> > On Wed, Oct 26, 2016 at 6:27 PM, Allon Mureinik <amureini(a)redhat.com>
> wrote:
> >>
> >> Yipes.
> >> [1] should fix that, waiting for the CI to run to merge.
> >>
> >> [1]
https://gerrit.ovirt.org/#/c/65768/
> >>
> >> On Wed, Oct 26, 2016 at 3:42 PM, Nadav Goldin <ngoldin(a)redhat.com>
> wrote:
> >>>
> >>> Unfortunately it is still failing, see[1], the repository used was
> >>> built from commit [2]. If you want to check the logs same links
> >>> apply(just replace build number 2759->2782)
> >>>
> >>>
> >>>
> >>> [1]
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
> ster/2782
> >>> [2]
https://gerrit.ovirt.org/#/c/65740/
> >>>
> >>> On Wed, Oct 26, 2016 at 11:33 AM, Tal Nisan <tnisan(a)redhat.com>
> wrote:
> >>> > Should be fixed now on master as those patches were just merged:
> >>> >
> >>> >
https://gerrit.ovirt.org/#/c/65738/ - "Move InjectorRule to
> VdsBroker":
> >>> > aligns the InjectorRule's package with the Injector so it can
> >>> > essentially be
> >>> > used wherever the Injector is used [+12, -4]
> >>> >
https://gerrit.ovirt.org/#/c/65739 - "InjectorRule: override
> injector
> >>> > anyway": Fixes a bug in InjectorRule where the Injector is
only
> mocked
> >>> > if
> >>> > you call InjectorRule.bind [+6, -1]
> >>> >
https://gerrit.ovirt.org/#/c/65740 - "core: InjectorRule for
> injecting
> >>> > members": Fixes a bug in InjectorRule to allow using
> >>> > Injector.injectMembers
> >>> > when using it [+18, -5]
> >>> >
https://gerrit.ovirt.org/#/c/65725 - "core: Fix
AuditLogging": The
> >>> > actual
> >>> > fix. Basically, goes over all the places that create an
> >>> > AuditLoggableBase
> >>> > that needs injecting and take care of it [+155, -160]
> >>> >
> >>> >
> >>> > On Wed, Oct 26, 2016 at 10:04 AM, Nadav Goldin
<ngoldin(a)redhat.com>
> >>> > wrote:
> >>> >>
> >>> >> Hi,
> >>> >> We have new failure on OST from patches merged to master
yesterday,
> >>> >> the failure started after the merge of [1], but as there were
> quite a
> >>> >> few patches merged quickly I can't make sure it is the one
causing
> >>> >> it(OST aren't ran per-patch).
> >>> >> The test that fails is [2] when attempting to start the VM.
> >>> >>
> >>> >> The error from the API side:
> >>> >>
> >>> >> RequestError:
> >>> >> status: 500
> >>> >> reason: Internal Server Error
> >>> >> detail: javax.ejb.EJBException: java.lang.NullPointerException
> >>> >> at
> >>> >>
> >>> >> org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(
> CMTTxInterceptor.java:213)
> >>> >> at
> >>> >>
> >>> >> org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInte
> rceptor.java:265)
> >>> >> at
> >>> >>
> >>> >> org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxIntercep
> tor.java:374)
> >>> >> at
> >>> >>
> >>> >> org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTT
> xInterceptor.java:243)
> >>> >> at
> >>> >>
> >>> >> org.jboss.invocation.InterceptorContext.proceed(InterceptorC
> ontext.java:340)
> >>> >> ....
> >>> >>
> >>> >> In the engine logs there are a few
'java.lang.NullPointerExceptio
> n'
> >>> >> errors:
> >>> >>
> >>> >> 2016-10-25 11:53:52,845 INFO
> >>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
> gableBase]
> >>> >> (org.ovirt.thread.pool-6-thread-2) [5e6a88be] Failed to get
vds
> >>> >> 'd60db21f-95f0-487b-9f17-44861e2610a7', error: null
> >>> >> 2016-10-25 11:53:52,864 DEBUG
> >>> >> [org.ovirt.engine.core.utils.timer.FixedDelayJobListener]
> >>> >> (DefaultQuartzScheduler5) [] Rescheduling
> >>> >>
> >>> >>
> >>> >> DEFAULT.org.ovirt.engine.core.bll.tasks.AsyncTaskManager.tim
> erElapsed#-9223372036854775787
> >>> >> as there is no unfired trigger.
> >>> >> ...
> >>> >> 2016-10-25 11:53:52,845 DEBUG
> >>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
> gableBase]
> >>> >> (org.ovirt.thread.pool-6-thread-2) [5e6a88be] Exception:
> >>> >> java.lang.NullPointerException
> >>> >> at
> >>> >>
> >>> >> org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLog
> ableBase.getVdsStatic(AuditLogableBase.java:633)
> >>> >> [dal.jar:]
> >>> >> at
> >>> >>
> >>> >> org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLog
> ableBase.getVdsName(AuditLogableBase.java:504)
> >>> >> [dal.jar:]
> >>> >> ...
> >>> >> 2016-10-25 11:53:52,837 INFO
> >>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
> gableBase]
> >>> >> (org.ovirt.thread.pool-6-thread-2) [5e6a88be] Failed to get
vds
> >>> >> 'd60db21f-95f0-487b-9f17-44861e2610a7', error: null
> >>> >> 2016-10-25 11:53:52,837 DEBUG
> >>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
> gableBase]
> >>> >> (org.ovirt.thread.pool-6-thread-2) [5e6a88be] Exception:
> >>> >> java.lang.NullPointerException
> >>> >> at
> >>> >>
> >>> >> org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLog
> ableBase.getVdsStatic(AuditLogableBase.java:633)
> >>> >> [dal.jar:]
> >>> >> at
> >>> >>
> >>> >> org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLog
> ableBase.getVdsName(AuditLogableBase.java:504)
> >>> >> [dal.jar:]
> >>> >> ...
> >>> >>
> >>> >> The full engine logs can be found here[3] and the entire test
suite
> >>> >> logs here[4].
> >>> >>
> >>> >> Can anyone have a look?
> >>> >>
> >>> >> Thanks,
> >>> >> Nadav.
> >>> >>
> >>> >>
> >>> >> [1]
https://gerrit.ovirt.org/#/c/65198/
> >>> >> [2]
> >>> >>
> >>> >>
https://github.com/oVirt/ovirt-system-tests/blob/master/
> basic_suite_master/test-scenarios/004_basic_sanity.py#L322
> >>> >> [3]
> >>> >>
> >>> >>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
> ster/2759/artifact/exported-artifacts/basic_suite_master.sh-
> fc24/exported-artifacts/test_logs/basic_suite_master/post-
> 004_basic_sanity.py/*zip*/post-004_basic_sanity.py.zip
> >>> >> [4]
> >>> >>
> >>> >>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ma
> ster/2759/artifact/exported-artifacts/basic_suite_master.sh-
> fc24/exported-artifacts/
> >>> >> _______________________________________________
> >>> >> Devel mailing list
> >>> >> Devel(a)ovirt.org
> >>> >>
http://lists.ovirt.org/mailman/listinfo/devel
> >>> >
> >>> >
> >>
> >>
> >
>