Re: [ovirt-devel] gerrit+ci improvement proposal

4 Jun 2015

      ----- Original Message -----
...
From: "Oved Ourfali" <ovedo@redhat.com>
To: "Eyal Edri" <eedri@redhat.com>
Cc: devel@ovirt.org, infra@ovirt.org
Sent: Thursday, June 4, 2015 10:03:02 AM
Subject: Re: [ovirt-devel] gerrit+ci improvement proposal
----- Original Message -----
...
From: "Eyal Edri" <eedri@redhat.com>
To: "Sandro Bonazzola" <sbonazzo@redhat.com>
Cc: infra@ovirt.org, devel@ovirt.org
Sent: Thursday, June 4, 2015 9:46:40 AM
Subject: Re: [ovirt-devel] gerrit+ci improvement proposal
----- Original Message -----
...
From: "Sandro Bonazzola" <sbonazzo@redhat.com>
To: "Eyal Edri" <eedri@redhat.com>, "Max Kovgan" <mkovgan@redhat.com>
Cc: devel@ovirt.org, infra@ovirt.org
Sent: Thursday, June 4, 2015 9:11:10 AM
Subject: Re: [ovirt-devel] gerrit+ci improvement proposal
Il 03/06/2015 21:46, Eyal Edri ha scritto:
...
----- Original Message -----
...
From: "Max Kovgan" <mkovgan@redhat.com>
To: devel@ovirt.org
Cc: infra@ovirt.org
Sent: Wednesday, June 3, 2015 8:22:54 PM
Subject: [ovirt-devel] gerrit+ci improvement proposal
Hi everyone!
We really want to have reliable and snappy CI: to allow short cycles
and
encourage developers to write tests.
# Problem
Many patches are neither ready for review nor for CI upon submission,
which
is OK.
But running all the jobs on those patches with limited resources
results
in:
overloaded resources, slow response time, unhappy developers.
# Proposed Solution
To run less jobs we know we don’t need to, thus making more resources
for
the
jobs we need to run.
We have been experimenting to make our CI stabler and quicker to
respond
by
using gerrit flags. This has improved in both directions very well
internally.
Now it seems a good time to let all the oVirt projects to use this.
This solution indirectly promotes reviews and quick tests - “to fail
early”,
yet full blown static code analysis and long tests to run “when
ready”.
# How it works
2 new gerrit independent flags are added to gerrit.
## CI flag
Will express patch CI status. Values:
 * +1 CI passed
 *  0 CI did not run yet
 * -1 CI failed
Permissions for setting: project maintainers (for special cases)
should
be
able to set/override (except Jenkins).
## Workflow flag
Will express patch “workflow” state. Values:
 *  0 Work In Progress
 * +1 Ready For Review
 * +2 Ready For Merge
Permissions for setting: Owner can set +1, Project Maintainers can set
+2
## Review + CI Integration:
Merging [“Submit” button to appear] will require: Review+1, CI+1,
Workflow+2
Patch lifecycle now is:
---------------------------------------------------------------
patch state   |owner     |reviewer |maintainer |CI tests |pass
---------------------------------------------------------------
added/updated |-         |-        |-          |quick    |CI+1
review        |Workflow+1|Review+1 |-          |heavy	 |CI+1
merge ready   |-         |-        |Workflow+2 |gating   |CI+1
merge         |-         |-        |merge      |merge    |CI+1
Changes from current workflow:
Owner only adds reviewers, now owner needs to set "Workflow+1" for the
patch
to be reviewed, and heavily auto-tested.
Maintainer now needs to set "Workflow+2" and wait for "Submit" button
to
appear after CI has completed running gating tests.
Next step will be to automate merge the change after Workflow+2 has
been
set
by the Maintainer and gating tests passed.
## Why now?
It is elimination of waste. The sooner - the better.
The solution has been used for a while and it works.
Resolving the problem without gerrit involved will lead to adding
unreliable
code into jobs, and will still be prone to problems:
  Just recently, 3d ago we’ve tried detecting what to run from jenkins
  relying only on gerrit comments so that upon Verified+1, we’d run
  the
  job.
  We could not use “Review+1”, because it makes no sense at all, so we
  left
  the job to set Verified+1.
  Meaning - re-trigger itself immediately more than 1 times.
Jenkins and its visitors very unhappy, and we had to stop those
  jobs,
  clean
  up the queue, and spam developers.
## OK OK OK. Now what?
Now we want your comments and opinions before pushing this further:
Please participate in this thread, so we can start trying it out.
Ask, Suggest better ideas, all this is welcome.
Best Regards!
N.B.
Of course, this is not written in stone, in case we find a better
approach
on
solving those issues, we will change to it.
And we will keep improving so don't be afraid that it will be
enforced:
if
this does not work out we will discard it.
P.S.
Kudos to dcaro, most of the work was done by him, and most of this
text
too.
+1 from me, releasing CI from running non critical and un-essential
jobs
will not only reduce load from ci,
and shorted response time for developers, it will allow us to add much
more
powerful tests such as functional & system
tests that actually add hosts and run VMs, improving our ability to
find
regression much more effectively.
Another benefit to consider is saving reviewers time. I.e not only
jenkins
benefits from Worklow+1, but also human reviewers.
Instead of looking at a patch that is too early to be reviewed, the
author
can set the Workflow+1 when the code is ready to review
(even if he didn't verified it yet), thus saving time to other
reviewers
-
for example people can add an email rule
to alert them only when they are added to patches that have Workflow+1,
and
not before.
For human reviewers I suggest to keep using drafts until the patch is
finished.
keep using? how many developers do you know are working with drafts until
their patch is ready?
i agree if everyone would use drafts load on jenkins was already much
lower,
unfortunately its not the case.
IMO we don't need the "workflow" flag.
I'm okay with CI not running on "drafts". And yes... we do use them.
We can try and educate people to use them more where needed.
Drafts should be widely used in first-phase development, and less on
bug-fixes.
In addition, I think the patch owners shouldn't add reviewers, unless they
need their input in the stage of the development.
Once they want input, they should add reviewers.
1. So, if the patch is draft then no CI runs on it.
2. Once it turns into non-draft, you can run "light-CI" on it.
3. Once the patch has at least one +1 from a (human) reviewer, then it should
run the "heavy" CI.
This is pointless, we will have to ask someone (or add another user) to +1 the
patch to enable the CI :-)
...
4. Once the patch has +1 from heavy CI, and +2 from reviewer (maintainer),
then it can be merged.
That's the process we have today, with slight change on when to run the CI
and what CI to run (no CI on drafts, light CI on non-draft, heavy CI on +1
patches).
I think Oved proposal is simpler and more useful.

However, we need a way to run *any* ci jobs even on a draft. If we cannot
afford this automatically with current system, lets add a way to trigger this
manually from gerrit.

So this becomes simply:

1. drafts do not run the ci automatically, but the owner can run the ci manually
2. published patches run the light ci jobs automatically, owner can run heavy jobs manually
3. run all ci jobs before merge, maintain can ignore ci results

This works with the system David suggested in the past, where each project 
implement scripts for checking patches and checking patches before merge.

Nir

Re: [ovirt-devel] gerrit+ci improvement proposal

Nir Soffer