[ovirt-devel] gerrit+ci improvement proposal

Thu Jun 4 08:08:07 UTC 2015

----- Original Message -----
> From: "Nir Soffer" <nsoffer at redhat.com>
> To: "David Caro" <dcaroest at redhat.com>
> Cc: devel at ovirt.org, infra at ovirt.org
> Sent: Thursday, June 4, 2015 10:57:30 AM
> Subject: Re: [ovirt-devel] gerrit+ci improvement proposal
> 
> ----- Original Message -----
> > From: "David Caro" <dcaroest at redhat.com>
> > To: "Nir Soffer" <nsoffer at redhat.com>
> > Cc: "Max Kovgan" <mkovgan at redhat.com>, infra at ovirt.org, devel at ovirt.org
> > Sent: Thursday, June 4, 2015 10:35:12 AM
> > Subject: Re: [ovirt-devel] gerrit+ci improvement proposal
> > 
> > On 06/04, Nir Soffer wrote:
> > > ----- Original Message -----
> > > > From: "Max Kovgan" <mkovgan at redhat.com>
> > > > To: devel at ovirt.org
> > > > Cc: infra at ovirt.org
> > > > Sent: Wednesday, June 3, 2015 8:22:54 PM
> > > > Subject: [ovirt-devel] gerrit+ci improvement proposal
> > > > 
> > > > Hi everyone!
> > > > We really want to have reliable and snappy CI: to allow short cycles
> > > > and
> > > > encourage developers to write tests.
> > > > 
> > > > # Problem
> > > > 
> > > > Many patches are neither ready for review nor for CI upon submission,
> > > > which
> > > > is OK.
> > > > But running all the jobs on those patches with limited resources
> > > > results
> > > > in:
> > > > overloaded resources, slow response time, unhappy developers.
> > > > 
> > > > # Proposed Solution
> > > > 
> > > > To run less jobs we know we don’t need to, thus making more resources
> > > > for
> > > > the
> > > > jobs we need to run.
> > > > We have been experimenting to make our CI stabler and quicker to
> > > > respond
> > > > by
> > > > using gerrit flags. This has improved in both directions very well
> > > > internally.
> > > > Now it seems a good time to let all the oVirt projects to use this.
> > > > This solution indirectly promotes reviews and quick tests - “to fail
> > > > early”,
> > > > yet full blown static code analysis and long tests to run “when ready”.
> > > > 
> > > > # How it works
> > > > 
> > > > 2 new gerrit independent flags are added to gerrit.
> > > > 
> > > > ## CI flag
> > > > 
> > > > Will express patch CI status. Values:
> > > >  * +1 CI passed
> > > >  *  0 CI did not run yet
> > > >  * -1 CI failed
> > > 
> > > -1
> > > 
> > > You must have CI error state. Most of the errors I have seen, the CI fail
> > > to run,
> > > and the failure is not related to the tested patch.
> > 
> > You'll have to prove this, as the last time we discussed it less than 10%
> > of
> > the failures from the previous 2 weeks were that case.
> 
> 10% is 10% too much :-)
> 
> > Also that's why maintainers can set +1 here and -1 is not a blocker.
> 
> Ok, than its fine.
> 
> > > > Permissions for setting: project maintainers (for special cases) should
> > > > be
> > > > able to set/override (except Jenkins).
> > > > 
> > > > ## Workflow flag
> > > > 
> > > > Will express patch “workflow” state. Values:
> > > >  *  0 Work In Progress
> > > >  * +1 Ready For Review
> > > >  * +2 Ready For Merge
> > > > Permissions for setting: Owner can set +1, Project Maintainers can set
> > > > +2
> > > > 
> > > > ## Review + CI Integration:
> > > > 
> > > > Merging [“Submit” button to appear] will require: Review+1, CI+1,
> > > > Workflow+2
> > > > Patch lifecycle now is:
> > > > ---------------------------------------------------------------
> > > > patch state   |owner     |reviewer |maintainer |CI tests |pass
> > > > ---------------------------------------------------------------
> > > > added/updated |-         |-        |-          |quick    |CI+1
> > > > review        |Workflow+1|Review+1 |-          |heavy	 |CI+1
> > > > merge ready   |-         |-        |Workflow+2 |gating   |CI+1
> > > > merge         |-         |-        |merge      |merge    |CI+1
> > > 
> > > -1
> > > 
> > > Cannot require CI+1, the CI is not reliable enough yet.
> > > 
> > > Even if the CI will be reliable, a failed test which is not related to
> > > the
> > > submitted
> > > patch should not block unrelated changes.
> > 
> > This situation makes no sense, if it's reliable theres no error due to non
> > changed stuff (unless the issue that triggers the error is part of your
> > patch
> > history, because it's merged or you are based on a broken patch, in this
> > case
> > it should fail and block)
> 
> Here is an example:
> 
> http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/3314/
> 
> Failed due to flaky networking tests,  the failure is not related to the
> patch,
> which changes multipath configuration for some device.
> 
> Flaky tests should be fixed, but blocking development because of them is not
> productive.
> 
> The purpose of the CI is to help us move faster, not slow us down.
> 
> The pep8 tests is doing this right - it checks only the code added in the
> last
> patch.
> 
> In the current situation, the only way to move forward is to fix the flaky
> tests or mark them as @brokentest.
> 
> > > > Changes from current workflow:
> > > > Owner only adds reviewers, now owner needs to set "Workflow+1" for the
> > > > patch
> > > > to be reviewed, and heavily auto-tested.
> > > > Maintainer now needs to set "Workflow+2" and wait for "Submit" button
> > > > to
> > > > appear after CI has completed running gating tests.
> > > 
> > > -1
> > > 
> > > This means more work for developers.
> > > 
> > > Instead, make workflow default to +1. If the want to disable the CI, she
> > > can
> > > set it to 0.
> > > 
> > > Developers need more power and less process overhead.
> > 
> > We have to make some stats, but from the small sample of patches that we
> > checked, most than half of them were not ready to pass any ci,
> 
> Of course they were not ready, it is expected and good, and save developer
> time.
> 
> The best way to check if a patch is ready for the CI is to let the CI run it
> :-)
> 
> > it's a bit
> > more
> > work, if you keep using drafts, if you don't it's the same. It's not ment
> > to
> > be
> > used alongside drafts but instead of it.
> > 
> > So the only case where it increases the overhead is if you did not use
> > drafts,
> > in which case you should start using them, so no real downside there.
> 
> How draft will help here? the CI run the tests on draft currently, and it is
> very convenient.
> 

it might be convenient, but unfourtunately we don't have enough resources in terms of slaves/servers
to to on drafts, which cause jobs in jenkins to run much longer than they should....
i agree in a perfect world without a resource problem, we would run all tests on all patches.

> It would be even more convenient if I can disable the CI for a draft, because
> it is not complete enough to pass the tests.
> 
> Nir
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
> 
> 
>