[JIRA] (OVIRT-938) Fix Jenkins slave connection dying on vdsm check_merged jobs
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-938?page=com.atlassian.jira... ]
Barak Korren commented on OVIRT-938:
------------------------------------
Trying to get more info about this, here are the journal lines we get on the slave when the job fails:
{code}
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org libvirtd[17504]: End of file while reading data: Input/output error
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: USER_END pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: USER_LOGOUT pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=1106 audit(1482333814.930:3694): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=1113 audit(1482333814.930:3695): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=1112 audit(1482333814.952:3696): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=1105 audit(1482333814.952:3697): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: USER_LOGIN pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: USER_START pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: CRYPTO_KEY_USER pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:95:42:59:76:99:08:09:ae:94:d8:46:3e:61:7d:53:b6:e5:1e:f5:f5:73:f2:04:87:38:ca:80:4f:44:6d:df:d5 direction=? spid=25269 suid=1012 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=2404 audit(1482333814.953:3698): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:95:42:59:76:99:08:09:ae:94:d8:46:3e:61:7d:53:b6:e5:1e:f5:f5:73:f2:04:87:38:ca:80:4f:44:6d:df:d5 direction=? spid=25269 suid=1012 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org sshd[5421]: Received disconnect from 66.187.230.92 port 53024:11: Closed due to user request.
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=2404 audit(1482333814.989:3699): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:95:42:59:76:99:08:09:ae:94:d8:46:3e:61:7d:53:b6:e5:1e:f5:f5:73:f2:04:87:38:ca:80:4f:44:6d:df:d5 direction=? spid=5421 suid=1012 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=2404 audit(1482333814.989:3700): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=5421 suid=1012 rport=53024 laddr=172.19.12.136 lport=22 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: CRYPTO_KEY_USER pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:95:42:59:76:99:08:09:ae:94:d8:46:3e:61:7d:53:b6:e5:1e:f5:f5:73:f2:04:87:38:ca:80:4f:44:6d:df:d5 direction=? spid=5421 suid=1012 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: CRYPTO_KEY_USER pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=5421 suid=1012 rport=53024 laddr=172.19.12.136 lport=22 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org sshd[5421]: Disconnected from 66.187.230.92 port 53024
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=1106 audit(1482333814.995:3701): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_lastlog acct="jenkins" exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=1104 audit(1482333814.995:3702): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="jenkins" exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org kernel: audit: type=1106 audit(1482333814.996:3703): pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: USER_END pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_lastlog acct="jenkins" exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: CRED_DISP pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="jenkins" exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: USER_END pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: USER_LOGOUT pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=1012 exe="/usr/sbin/sshd" hostname=66.187.230.92 addr=66.187.230.92 terminal=ssh res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: CRYPTO_KEY_USER pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:c1:30:1f:20:35:5f:1a:4a:cc:9f:c9:eb:05:0c:0f:41:06:32:52:d7:d3:e9:fc:0a:aa:ae:f4:ce:98:15:b3:4c direction=? spid=5410 suid=0 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: CRYPTO_KEY_USER pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:3d:5f:aa:8f:5f:6d:01:ba:8a:6c:ca:2a:ae:4e:78:f6:19:cd:44:7c:df:6a:4d:58:ef:be:16:c0:41:b9:a8:a4 direction=? spid=5410 suid=0 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org audit[5410]: CRYPTO_KEY_USER pid=5410 uid=0 auid=1012 ses=3 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:95:42:59:76:99:08:09:ae:94:d8:46:3e:61:7d:53:b6:e5:1e:f5:f5:73:f2:04:87:38:ca:80:4f:44:6d:df:d5 direction=? spid=5410 suid=0 exe="/usr/sbin/sshd" hostname=? addr=66.187.230.92 terminal=? res=success'
Dec 21 15:23:34 vm0136.workers-phx.ovirt.org sshd[5410]: pam_unix(sshd:session): session closed for user jenkins
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd-logind[444]: Removed session 3.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[1]: Stopping User Manager for UID 1012...
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Stopped target Default.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Stopped target Basic System.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Stopped target Timers.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Stopped target Paths.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Stopped target Sockets.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Closed D-Bus User Message Bus Socket.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Reached target Shutdown.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Starting Exit the Session...
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[5414]: Received SIGRTMIN+24 from PID 25273 (kill).
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[1]: Stopped User Manager for UID 1012.
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1012 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 21 15:23:35 vm0136.workers-phx.ovirt.org systemd[1]: Removed slice User Slice of jenkins.
{code}
> Fix Jenkins slave connection dying on vdsm check_merged jobs
> ------------------------------------------------------------
>
> Key: OVIRT-938
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-938
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: Jenkins
> Reporter: Barak Korren
> Assignee: infra
>
> Something in the vdsm build_artifacs job makes the Jenkins slave disconnect when it is running. This in turn makes the cleanup scripts not run on the slave leaving it dirty enough to make the next job on that slave fail.
> Example of this can be seen here:
> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/692/console
> Relevant log lines:
> {code}
> 21:49:00 Ran 44 tests in 1231.988s
> 21:49:00
> 21:49:00 OK
> 21:49:00 + return 0
> 21:49:00 sh: [13086: 1 (255)] tcsetattr: Inappropriate ioctl for device
> 21:49:00 Took 2464 seconds
> 21:49:00 ===================================
> 21:49:00 logout
> 21:49:01 Slave went offline during the build
> 21:49:01 ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
> 21:49:01 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
> 21:49:01 Caused by: java.io.EOFException
> 21:49:01 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
> 21:49:01 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
> 21:49:01 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
> 21:49:01 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
> 21:49:01 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
> 21:49:01 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
> 21:49:01 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
> 21:49:01
> 21:49:01 Build step 'Execute shell' marked build as failure
> 21:49:01 Performing Post build task...
> 21:49:01 Match found for :.* : True
> 21:49:01 Logical operation result is TRUE
> 21:49:01 Running script : #!/bin/bash -x
> 21:49:01 echo "shell-scripts/mock_cleanup.sh"
> ... SNIP ...
> 21:49:01 Exception when executing the batch command : no workspace from node hudson.slaves.DumbSlave[fc24-vm06.phx.ovirt.org] which is computer hudson.slaves.SlaveComputer@30863c81 and has channel null
> 21:49:01 Build step 'Post build task' marked build as failure
> 21:49:02 ERROR: Step ?Archive the artifacts? failed: no workspace for vdsm_master_check-merged-el7-x86_64 #692
> 21:49:02 ERROR: Failed to evaluate groovy script.
> 21:49:02 java.lang.NullPointerException: Cannot invoke method child() on null object
> 21:49:02 at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:77)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:45)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:32)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
> 21:49:02 at Script1.run(Script1.groovy:2)
> 21:49:02 at groovy.lang.GroovyShell.evaluate(GroovyShell.java:580)
> 21:49:02 at groovy.lang.GroovyShell.evaluate(GroovyShell.java:618)
> 21:49:02 at groovy.lang.GroovyShell.evaluate(GroovyShell.java:589)
> 21:49:02 at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SecureGroovyScript.evaluate(SecureGroovyScript.java:166)
> 21:49:02 at org.jvnet.hudson.plugins.groovypostbuild.GroovyPostbuildRecorder.perform(GroovyPostbuildRecorder.java:361)
> 21:49:02 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> 21:49:02 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
> 21:49:02 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:723)
> 21:49:02 at hudson.model.Build$BuildExecution.post2(Build.java:185)
> 21:49:02 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:668)
> 21:49:02 at hudson.model.Run.execute(Run.java:1763)
> 21:49:02 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> 21:49:02 at hudson.model.ResourceController.execute(ResourceController.java:98)
> 21:49:02 at hudson.model.Executor.run(Executor.java:410)
> 21:49:02 Build step 'Groovy Postbuild' marked build as failure
> 21:49:02 Started calculate disk usage of build
> 21:49:02 Finished Calculation of disk usage of build in 0 seconds
> 21:49:02 Finished: FAILURE
> {code}
--
This message was sent by Atlassian JIRA
(v1000.621.5#100023)
7 years, 11 months
[JIRA] (OVIRT-955) Re: http://lists.ovirt.org/pipermail/announce/ is not reachable
by Marc Dequènes (Duck) (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-955?page=com.atlassian.jira... ]
Marc Dequènes (Duck) commented on OVIRT-955:
--------------------------------------------
yes, a few days ago. the progress was on the migration ticket.
> Re: http://lists.ovirt.org/pipermail/announce/ is not reachable
> ---------------------------------------------------------------
>
> Key: OVIRT-955
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-955
> Project: oVirt - virtualization made easy
> Issue Type: By-EMAIL
> Reporter: sbonazzo
> Assignee: infra
>
> Tested it, I've access forbidden there:
> http://lists.phx.ovirt.org/pipermail/announce/
> archives should be publicly accessible.
> On Mon, Dec 12, 2016 at 2:21 PM, Evgheni Dereveanchin <ederevea(a)redhat.com>
> wrote:
> > I totally agree with Barak on this. The hostname
> > must work as there's a bunch of links indexed
> > by search engine which lead to it.
> > In general the PHX one should not be used
> > by end users. I've re-added the CNAME and edited
> > the virtualhost on apache so it should be available
> > again once existing DNS caches expire.
> >
> > Regards,
> > Evgheni Dereveanchin
> >
> > ----- Original Message -----
> > From: "Barak Korren" <bkorren(a)redhat.com>
> > To: "Eyal Edri" <eedri(a)redhat.com>
> > Cc: "infra" <infra(a)ovirt.org>, "Sandro Bonazzola" <sbonazzo(a)redhat.com>,
> > "Evgheni Dereveanchin" <ederevea(a)redhat.com>, "duck" <duck(a)redhat.com>,
> > infra-owner(a)ovirt.org
> > Sent: Monday, 12 December, 2016 2:05:57 PM
> > Subject: Re: http://lists.ovirt.org/pipermail/announce/ is not reachable
> >
> > It can be reached now at lists.phx.ovirt.org (but we we need a CNAME
> > from the old name)
> >
> > On 12 December 2016 at 15:03, Eyal Edri <eedri(a)redhat.com> wrote:
> > > I'm not sure who gets emails sent to infra-owner list. best to send it
> > to
> > > infra or infra support.
> > >
> > > Adding Evgheni and Duck who are working on migration of the mm server.
> > >
> > > On Dec 12, 2016 2:58 PM, "Sandro Bonazzola" <sbonazzo(a)redhat.com> wrote:
> > >>
> > >> The lists server is not reachable anymore. Looks like a dns issue:
> > >>
> > >> $ traceroute lists.ovirt.org
> > >> lists.ovirt.org: Name or service not known
> > >>
> > >>
> > >> --
> > >> Sandro Bonazzola
> > >> Better technology. Faster innovation. Powered by community
> > collaboration.
> > >> See how it works at redhat.com
> > >
> > >
> > > _______________________________________________
> > > Infra mailing list
> > > Infra(a)ovirt.org
> > > http://lists.phx.ovirt.org/mailman/listinfo/infra
> > >
> >
> >
> >
> > --
> > Barak Korren
> > bkorren(a)redhat.com
> > RHCE, RHCi, RHV-DevOps Team
> > https://ifireball.wordpress.com/
> >
> --
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community collaboration.
> See how it works at redhat.com
--
This message was sent by Atlassian JIRA
(v1000.621.5#100023)
7 years, 11 months