[JIRA] (OVIRT-296) [jenkins] take offline faulty bad slaves
by eyal edri [Administrator] (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-296?page=com.atlassian.jira... ]
eyal edri [Administrator] reassigned OVIRT-296:
-----------------------------------------------
Assignee: Evgheni Dereveanchin (was: infra)
> [jenkins] take offline faulty bad slaves
> ----------------------------------------
>
> Key: OVIRT-296
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-296
> Project: oVirt - virtualization made easy
> Issue Type: Task
> Components: Jenkins
> Affects Versions: Test
> Reporter: eyal edri [Administrator]
> Assignee: Evgheni Dereveanchin
> Labels: jenkins, monitoring,
>
> it seems that quite often we hit an issue with a specific slave on phx, due to various reasons (out of space/git/network/etc..).
> which leads to multiple jobs trying to run on it and failing.
> we need an automated way of finding this.
> proposal:
> add post groovy build to jobs that will take a slave offline if it's misbehaves using:
> manager.build.getBuiltOn().toComputer.setTemporarilyOffline(true)
> the trick is to find such a slave and to be able to know if it failed consistently in the past X hours to justify it's disable.
> we need some sort of counter or service to track slaves and thier error state and according to it take offline a specific slave.
> for example:
> if a slave was failing x jobs in Y time and runtime was < Z min , it might indicate such a problem.
> (e.g 10 jobs were failing on the same slave in a timeframe of 5 min and job runtime was less than a 1 min.. )
> the post script should email infra(a)ovirt.org that it disabled a slave and we should look into it.
--
This message was sent by Atlassian JIRA
(v1000.621.5#100023)
8 years
[JIRA] (OVIRT-20) Document oVirt infra SLA
by eyal edri [Administrator] (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-20?page=com.atlassian.jira.... ]
eyal edri [Administrator] reassigned OVIRT-20:
----------------------------------------------
Assignee: Evgheni Dereveanchin (was: infra)
> Document oVirt infra SLA
> -------------------------
>
> Key: OVIRT-20
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-20
> Project: oVirt - virtualization made easy
> Issue Type: Task
> Components: General
> Reporter: quaid
> Assignee: Evgheni Dereveanchin
> Priority: Highest
>
> We need to document how we make changes:
> * What services need notices out to the world?
> * What are the timeframes for notices?
> \\- 1 hour for critical
> \\- 12 to 24 hours for non-critical
> * Who needs to be notified for what?
> \\- e.g. leave announce out of administrivia
> \\- arch, users, and infra are the defaults so far
> \\- perhaps create an infra-announce list that just includes the others?
> * When do we not make changes?
> \\- change freeze around releases
> \\- other change freeze reasons
> * What is the process to make a change during a freeze?
> \\- When is the freeze slushy?
> \\- When is the freeze solid and should not be broken because of the potential risk at that moment? (We need something to weigh risk against risk.)
--
This message was sent by Atlassian JIRA
(v1000.621.5#100023)
8 years
[JIRA] (OVIRT-935) Unable to build fc25 chroots with mock
by eyal edri [Administrator] (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-935?page=com.atlassian.jira... ]
eyal edri [Administrator] reassigned OVIRT-935:
-----------------------------------------------
Assignee: bkorren (was: infra)
> Unable to build fc25 chroots with mock
> --------------------------------------
>
> Key: OVIRT-935
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-935
> Project: oVirt - virtualization made easy
> Issue Type: Improvement
> Reporter: Nadav Goldin
> Assignee: bkorren
>
> Mock fails when trying to build the chroot for fedora25, on installing '@buildsys-build' group(with dnf) stage.
> Open bug: https://bugzilla.redhat.com/show_bug.cgi?id=1360781
> Temporarily workaround: add the 'libcrypt' package to '.packages.fc25' file.
> Possible solution: patch mock-*fc25*.cfg file in Jenkins repository to always install 'libcrypt' first.
> Until fixed this will block us from building anything with mock on fc25(hit it when trying to build fc25 RPMs for lago).
> Error logs:
> {code}
> DEBUG util.py:502: Executing command: ['/usr/bin/yum-deprecated', '--installroot', '/var/lib/mock/fedora-25-x86_64-2474d86945a1de9c6d14549ec2401b9c-8291/root/', '--releasever', '25', 'install', '@buildsys-build', 'git', 'python', 'python-dulwich', 'python-setuptools', 'yum', 'yum-utils', '--setopt=tsflags=nocontexts'] with env {'PS1': '<mock-chroot> \\s-\\v\\$ ',
> ....
> DEBUG util.py:421: Yum command has been deprecated, use dnf instead.
> DEBUG util.py:421: See 'man dnf' and 'man yum2dnf' for more information.
> DEBUG util.py:421: Error: libcrypt conflicts with libcrypt-nss-2.24-3.fc25.x86_64
> DEBUG util.py:421: Error: libcrypt-nss conflicts with libcrypt-2.24-3.fc25.x86_64
> DEBUG util.py:421: You could try using --skip-broken to work around the problem
> DEBUG util.py:421: You could try running: rpm -Va --nofiles --nodigest
> DEBUG util.py:557: Child return code was: 1
> DEBUG util.py:180: kill orphans
> {code}
--
This message was sent by Atlassian JIRA
(v1000.621.5#100023)
8 years
[JIRA] (OVIRT-938) Fix Jenkins slave connection dying on vdsm check_merged jobs
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-938?page=com.atlassian.jira... ]
Barak Korren commented on OVIRT-938:
------------------------------------
Got this from the slave log in Jenkis:
{code}
ERROR: Connection terminated
ESC[8mha:AAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=ESC[0mjava.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Slave JVM has terminated. Exit signal=TERM
[12/21/16 15:23:34] [SSH] Connection closed.
{code}
> Fix Jenkins slave connection dying on vdsm check_merged jobs
> ------------------------------------------------------------
>
> Key: OVIRT-938
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-938
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: Jenkins
> Reporter: Barak Korren
> Assignee: infra
>
> Something in the vdsm build_artifacs job makes the Jenkins slave disconnect when it is running. This in turn makes the cleanup scripts not run on the slave leaving it dirty enough to make the next job on that slave fail.
> Example of this can be seen here:
> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/692/console
> Relevant log lines:
> {code}
> 21:49:00 Ran 44 tests in 1231.988s
> 21:49:00
> 21:49:00 OK
> 21:49:00 + return 0
> 21:49:00 sh: [13086: 1 (255)] tcsetattr: Inappropriate ioctl for device
> 21:49:00 Took 2464 seconds
> 21:49:00 ===================================
> 21:49:00 logout
> 21:49:01 Slave went offline during the build
> 21:49:01 ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
> 21:49:01 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
> 21:49:01 Caused by: java.io.EOFException
> 21:49:01 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
> 21:49:01 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
> 21:49:01 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
> 21:49:01 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
> 21:49:01 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
> 21:49:01 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
> 21:49:01 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
> 21:49:01
> 21:49:01 Build step 'Execute shell' marked build as failure
> 21:49:01 Performing Post build task...
> 21:49:01 Match found for :.* : True
> 21:49:01 Logical operation result is TRUE
> 21:49:01 Running script : #!/bin/bash -x
> 21:49:01 echo "shell-scripts/mock_cleanup.sh"
> ... SNIP ...
> 21:49:01 Exception when executing the batch command : no workspace from node hudson.slaves.DumbSlave[fc24-vm06.phx.ovirt.org] which is computer hudson.slaves.SlaveComputer@30863c81 and has channel null
> 21:49:01 Build step 'Post build task' marked build as failure
> 21:49:02 ERROR: Step ?Archive the artifacts? failed: no workspace for vdsm_master_check-merged-el7-x86_64 #692
> 21:49:02 ERROR: Failed to evaluate groovy script.
> 21:49:02 java.lang.NullPointerException: Cannot invoke method child() on null object
> 21:49:02 at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:77)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:45)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:32)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
> 21:49:02 at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
> 21:49:02 at Script1.run(Script1.groovy:2)
> 21:49:02 at groovy.lang.GroovyShell.evaluate(GroovyShell.java:580)
> 21:49:02 at groovy.lang.GroovyShell.evaluate(GroovyShell.java:618)
> 21:49:02 at groovy.lang.GroovyShell.evaluate(GroovyShell.java:589)
> 21:49:02 at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SecureGroovyScript.evaluate(SecureGroovyScript.java:166)
> 21:49:02 at org.jvnet.hudson.plugins.groovypostbuild.GroovyPostbuildRecorder.perform(GroovyPostbuildRecorder.java:361)
> 21:49:02 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> 21:49:02 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
> 21:49:02 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:723)
> 21:49:02 at hudson.model.Build$BuildExecution.post2(Build.java:185)
> 21:49:02 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:668)
> 21:49:02 at hudson.model.Run.execute(Run.java:1763)
> 21:49:02 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> 21:49:02 at hudson.model.ResourceController.execute(ResourceController.java:98)
> 21:49:02 at hudson.model.Executor.run(Executor.java:410)
> 21:49:02 Build step 'Groovy Postbuild' marked build as failure
> 21:49:02 Started calculate disk usage of build
> 21:49:02 Finished Calculation of disk usage of build in 0 seconds
> 21:49:02 Finished: FAILURE
> {code}
--
This message was sent by Atlassian JIRA
(v1000.621.5#100023)
8 years
Fwd: Fedora 23 End Of Life
by Sandro Bonazzola
FYI.
---------- Messaggio inoltrato ----------
Da: "Mohan Boddu" <mboddu(a)redhat.com>
Data: 21/Dic/2016 04:05
Oggetto: Fedora 23 End Of Life
A: <announce(a)lists.fedoraproject.org>, <
test-announce(a)lists.fedoraproject.org>, <
devel-announce(a)lists.fedoraproject.org>
Cc:
As of the 20th of December 2016, Fedora 23 has reached its end of life
for updates and support. No further updates, including security
updates, will be available for Fedora 23. A previous reminder was sent
on 28th of November 2016 [0]. Fedora 24 will continue to receive
updates until approximately one month after the release of Fedora 26.
The maintenance schedule of Fedora releases is documented on the
Fedora Project wiki [1]. The Fedora Project wiki also contains
instructions [2] on how to upgrade from a previous release of Fedora
to a version receiving updates.
Mohan Boddu.
[0]https://lists.fedoraproject.org/archives/list/devel@lists.
fedoraproject.org/thread/HLHKRTIB33EDZXP624GHF2OZLHWAGKSJ/#
Q5O44X4BEBOYEKAEVLSXVI44DSNVHBYG
[1]https://fedoraproject.org/wiki/Fedora_Release_Life_
Cycle#Maintenance_Schedule
[2]https://fedoraproject.org/wiki/Upgrading?rd=DistributionUpgrades
_______________________________________________
devel-announce mailing list -- devel-announce(a)lists.fedoraproject.org
To unsubscribe send an email to devel-announce-leave(a)lists.fedoraproject.org
8 years