]
Barak Korren updated OVIRT-1984:
--------------------------------
Resolution: Won't Fix
Status: Done (was: To Do)
Create "out-of-band" slave cleanup and setup jobs
-------------------------------------------------
Key: OVIRT-1984
URL:
https://ovirt-jira.atlassian.net/browse/OVIRT-1984
Project: oVirt - virtualization made easy
Issue Type: New Feature
Components: Jenkins Slaves
Reporter: Barak Korren
Assignee: infra
Right now, we run slave cleaup and setup steps as part or every single job we run. This
has several shortcomings:
# It takes a long time from the point a user submitted a patch to the point his actual
test or build code runs
# If slave setup or cleanup steps fail - they fail the whole job for the user
# If slave setup or cleanup steps fail - they can keep failing for many jobs until the CI
team intervenes manually
# There is a "chicken and an egg" issue where some parts of the CI code have to
run before the slave was properly cleaned up and configured. This makes if harder to add
new slaves for the system.
Here is a suggested scheme to fix all this:
# Label all slaves that should be cleaned up automatically as 'cleanable'. This
is mostly to prevent the jobs described here from operating on the master node.
# Have a "cleanup scheduler" job that finds all slaves labelled as
"cleanable" but not as "dirty" or "clean", labels them as
"dirty" and runs a cleanup job on them.
# Have a "cleanup" job that is triggered on particular slaves by the
"cleanup scheduler" job, runs cleaup and setup steps on them and then labels
them as "clean" and removes the "dirty" label.
# Have all other CI jobs only use slaves with the "clean" label.
Notes:
# The "dirty" label is there to make the "cleanup scheduler" job not
trigger twice on the same slave before the"cleanup" job started cleaning it up.
# Since all slaves used by the real jobs will always be clean - there will no longer be a
need to run cleanup steps in the real jobs, thus saving time.
# If cleanup steps fail - the cleanup job will fail and the slave will not be marked as
"clean" so real jobs will never try to use it.
# To solve the "chicken and egg" issue, the cleanup job probably must be a
FreeStyle jobs and all the cleanup and setup code must be embedded into it by JJB. This
will probably require a newer version of JJB then what we have so setting OVIRT-1983 as a
blocker.
# There is an issue of how to make CI for this - if cleanup and setup steps are removed
from the normal STDCI jobs, they they will not be checked by the "check-patch"
job of the "jenkins repo". Here is a suggested scheme to solve this:
## Have a way to "loan" slaves from the production jenkins to other Jenkins
instances - this could be done by having a job that starts up the Jenkins JNLP client and
tells it to connect to another Jenkins master.
## As part of the "check-patch" job for the 'jenkins' repo - start a
Jenkins master in a container - attach some production slaves to it and have it run
cleanup and setup steps on them