----- Original Message -----
From: "David Caro" <dcaroest(a)redhat.com>
To: "Nir Soffer" <nsoffer(a)redhat.com>
Cc: "Max Kovgan" <mkovgan(a)redhat.com>, infra(a)ovirt.org, devel(a)ovirt.org
Sent: Thursday, June 4, 2015 10:35:12 AM
Subject: Re: [ovirt-devel] gerrit+ci improvement proposal
On 06/04, Nir Soffer wrote:
> ----- Original Message -----
> > From: "Max Kovgan" <mkovgan(a)redhat.com>
> > To: devel(a)ovirt.org
> > Cc: infra(a)ovirt.org
> > Sent: Wednesday, June 3, 2015 8:22:54 PM
> > Subject: [ovirt-devel] gerrit+ci improvement proposal
> >
> > Hi everyone!
> > We really want to have reliable and snappy CI: to allow short cycles and
> > encourage developers to write tests.
> >
> > # Problem
> >
> > Many patches are neither ready for review nor for CI upon submission,
> > which
> > is OK.
> > But running all the jobs on those patches with limited resources results
> > in:
> > overloaded resources, slow response time, unhappy developers.
> >
> > # Proposed Solution
> >
> > To run less jobs we know we don’t need to, thus making more resources for
> > the
> > jobs we need to run.
> > We have been experimenting to make our CI stabler and quicker to respond
> > by
> > using gerrit flags. This has improved in both directions very well
> > internally.
> > Now it seems a good time to let all the oVirt projects to use this.
> > This solution indirectly promotes reviews and quick tests - “to fail
> > early”,
> > yet full blown static code analysis and long tests to run “when ready”.
> >
> > # How it works
> >
> > 2 new gerrit independent flags are added to gerrit.
> >
> > ## CI flag
> >
> > Will express patch CI status. Values:
> > * +1 CI passed
> > * 0 CI did not run yet
> > * -1 CI failed
>
> -1
>
> You must have CI error state. Most of the errors I have seen, the CI fail
> to run,
> and the failure is not related to the tested patch.
You'll have to prove this, as the last time we discussed it less than 10% of
the failures from the previous 2 weeks were that case.
10% is 10% too much :-)
Also that's why maintainers can set +1 here and -1 is not a
blocker.
Ok, than its fine.
> > Permissions for setting: project maintainers (for special
cases) should
> > be
> > able to set/override (except Jenkins).
> >
> > ## Workflow flag
> >
> > Will express patch “workflow” state. Values:
> > * 0 Work In Progress
> > * +1 Ready For Review
> > * +2 Ready For Merge
> > Permissions for setting: Owner can set +1, Project Maintainers can set +2
> >
> > ## Review + CI Integration:
> >
> > Merging [“Submit” button to appear] will require: Review+1, CI+1,
> > Workflow+2
> > Patch lifecycle now is:
> > ---------------------------------------------------------------
> > patch state |owner |reviewer |maintainer |CI tests |pass
> > ---------------------------------------------------------------
> > added/updated |- |- |- |quick |CI+1
> > review |Workflow+1|Review+1 |- |heavy |CI+1
> > merge ready |- |- |Workflow+2 |gating |CI+1
> > merge |- |- |merge |merge |CI+1
>
> -1
>
> Cannot require CI+1, the CI is not reliable enough yet.
>
> Even if the CI will be reliable, a failed test which is not related to the
> submitted
> patch should not block unrelated changes.
This situation makes no sense, if it's reliable theres no error due to non
changed stuff (unless the issue that triggers the error is part of your patch
history, because it's merged or you are based on a broken patch, in this case
it should fail and block)
Here is an example:
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/3314/
Failed due to flaky networking tests, the failure is not related to the patch,
which changes multipath configuration for some device.
Flaky tests should be fixed, but blocking development because of them is not
productive.
The purpose of the CI is to help us move faster, not slow us down.
The pep8 tests is doing this right - it checks only the code added in the last
patch.
In the current situation, the only way to move forward is to fix the flaky
tests or mark them as @brokentest.
> > Changes from current workflow:
> > Owner only adds reviewers, now owner needs to set "Workflow+1" for
the
> > patch
> > to be reviewed, and heavily auto-tested.
> > Maintainer now needs to set "Workflow+2" and wait for
"Submit" button to
> > appear after CI has completed running gating tests.
>
> -1
>
> This means more work for developers.
>
> Instead, make workflow default to +1. If the want to disable the CI, she
> can
> set it to 0.
>
> Developers need more power and less process overhead.
We have to make some stats, but from the small sample of patches that we
checked, most than half of them were not ready to pass any ci,
Of course they were not ready, it is expected and good, and save developer time.
The best way to check if a patch is ready for the CI is to let the CI run it :-)
it's a bit
more
work, if you keep using drafts, if you don't it's the same. It's not ment to
be
used alongside drafts but instead of it.
So the only case where it increases the overhead is if you did not use
drafts,
in which case you should start using them, so no real downside there.
How draft will help here? the CI run the tests on draft currently, and it is
very convenient.
It would be even more convenient if I can disable the CI for a draft, because
it is not complete enough to pass the tests.
Nir