[ovirt-devel] gerrit+ci improvement proposal

Thu Jun 4 06:46:40 UTC 2015

----- Original Message -----
> From: "Sandro Bonazzola" <sbonazzo at redhat.com>
> To: "Eyal Edri" <eedri at redhat.com>, "Max Kovgan" <mkovgan at redhat.com>
> Cc: devel at ovirt.org, infra at ovirt.org
> Sent: Thursday, June 4, 2015 9:11:10 AM
> Subject: Re: [ovirt-devel] gerrit+ci improvement proposal
> 
> Il 03/06/2015 21:46, Eyal Edri ha scritto:
> > 
> > 
> > ----- Original Message -----
> >> From: "Max Kovgan" <mkovgan at redhat.com>
> >> To: devel at ovirt.org
> >> Cc: infra at ovirt.org
> >> Sent: Wednesday, June 3, 2015 8:22:54 PM
> >> Subject: [ovirt-devel] gerrit+ci improvement proposal
> >>
> >> Hi everyone!
> >> We really want to have reliable and snappy CI: to allow short cycles and
> >> encourage developers to write tests.
> >>
> >> # Problem
> >>
> >> Many patches are neither ready for review nor for CI upon submission,
> >> which
> >> is OK.
> >> But running all the jobs on those patches with limited resources results
> >> in:
> >> overloaded resources, slow response time, unhappy developers.
> >>
> >> # Proposed Solution
> >>
> >> To run less jobs we know we don’t need to, thus making more resources for
> >> the
> >> jobs we need to run.
> >> We have been experimenting to make our CI stabler and quicker to respond
> >> by
> >> using gerrit flags. This has improved in both directions very well
> >> internally.
> >> Now it seems a good time to let all the oVirt projects to use this.
> >> This solution indirectly promotes reviews and quick tests - “to fail
> >> early”,
> >> yet full blown static code analysis and long tests to run “when ready”.
> >>
> >> # How it works
> >>
> >> 2 new gerrit independent flags are added to gerrit.
> >>
> >> ## CI flag
> >>
> >> Will express patch CI status. Values:
> >>  * +1 CI passed
> >>  *  0 CI did not run yet
> >>  * -1 CI failed
> >> Permissions for setting: project maintainers (for special cases) should be
> >> able to set/override (except Jenkins).
> >>
> >> ## Workflow flag
> >>
> >> Will express patch “workflow” state. Values:
> >>  *  0 Work In Progress
> >>  * +1 Ready For Review
> >>  * +2 Ready For Merge
> >> Permissions for setting: Owner can set +1, Project Maintainers can set +2
> >>
> >> ## Review + CI Integration:
> >>
> >> Merging [“Submit” button to appear] will require: Review+1, CI+1,
> >> Workflow+2
> >> Patch lifecycle now is:
> >> ---------------------------------------------------------------
> >> patch state   |owner     |reviewer |maintainer |CI tests |pass
> >> ---------------------------------------------------------------
> >> added/updated |-         |-        |-          |quick    |CI+1
> >> review        |Workflow+1|Review+1 |-          |heavy	 |CI+1
> >> merge ready   |-         |-        |Workflow+2 |gating   |CI+1
> >> merge         |-         |-        |merge      |merge    |CI+1
> >>
> >> Changes from current workflow:
> >> Owner only adds reviewers, now owner needs to set "Workflow+1" for the
> >> patch
> >> to be reviewed, and heavily auto-tested.
> >> Maintainer now needs to set "Workflow+2" and wait for "Submit" button to
> >> appear after CI has completed running gating tests.
> >>
> >>
> >> Next step will be to automate merge the change after Workflow+2 has been
> >> set
> >> by the Maintainer and gating tests passed.
> >>
> >>
> >> ## Why now?
> >>
> >> It is elimination of waste. The sooner - the better.
> >> The solution has been used for a while and it works.
> >> Resolving the problem without gerrit involved will lead to adding
> >> unreliable
> >> code into jobs, and will still be prone to problems:
> >>   Just recently, 3d ago we’ve tried detecting what to run from jenkins
> >>   relying only on gerrit comments so that upon Verified+1, we’d run the
> >>   job.
> >>   We could not use “Review+1”, because it makes no sense at all, so we
> >>   left
> >>   the job to set Verified+1.
> >>   Meaning - re-trigger itself immediately more than 1 times.
> >>   
> >>   Jenkins and its visitors very unhappy, and we had to stop those jobs,
> >>   clean
> >>   up the queue, and spam developers.
> >>
> >> ## OK OK OK. Now what?
> >>
> >> Now we want your comments and opinions before pushing this further:
> >> Please participate in this thread, so we can start trying it out.
> >> Ask, Suggest better ideas, all this is welcome.
> >>
> >>
> >> Best Regards!
> >>
> >>
> >> N.B.
> >> Of course, this is not written in stone, in case we find a better approach
> >> on
> >> solving those issues, we will change to it.
> >> And we will keep improving so don't be afraid that it will be enforced: if
> >> this does not work out we will discard it.
> >>
> >> P.S.
> >> Kudos to dcaro, most of the work was done by him, and most of this text
> >> too.
> >>
> > 
> > +1 from me, releasing CI from running non critical and un-essential jobs
> > will not only reduce load from ci,
> > and shorted response time for developers, it will allow us to add much more
> > powerful tests such as functional & system
> > tests that actually add hosts and run VMs, improving our ability to find
> > regression much more effectively.
> > 
> > Another benefit to consider is saving reviewers time. I.e not only jenkins
> > benefits from Worklow+1, but also human reviewers.
> > Instead of looking at a patch that is too early to be reviewed, the author
> > can set the Workflow+1 when the code is ready to review
> > (even if he didn't verified it yet), thus saving time to other reviewers -
> > for example people can add an email rule
> > to alert them only when they are added to patches that have Workflow+1, and
> > not before.
> 
> For human reviewers I suggest to keep using drafts until the patch is
> finished.

keep using? how many developers do you know are working with drafts until
their patch is ready? 
i agree if everyone would use drafts load on jenkins was already much lower,
unfortunately its not the case.

> Once it's finished and humans reviewed the logic of the patch, Workflow+1
> should be triggered allowing automation to check the correctness of the
> patch.
> IMHO there's no reason for wasting CI time on patches that will be correct
> from an automation point of view but nacked by reviewers. Especially if the
> patches are part of a big patchset.
> 
> 
> > 
> > And one final note, for Workflow+2 -> this is a preparation for a gating
> > system, like Zuul used by openstack, that in the future
> > we might use as automatic merger pending passing a verification step. this
> > will prevent errors that happen sometimes
> > post merge due to conflicts or other issues, and will be another level of
> > validation before final merge.
> > But as max said, its all part of the plan and we'll test it of course
> > before implementing to see its value.
> > 
> > 
> >>
> >>
> >>
> >> Max Kovgan
> >>
> >> Senior Software Engineer
> >> Red Hat - EMEA ENG Virtualization R&D
> >> Tel.: +972 9769 2060
> >> Email: mkovgan [at] redhat [dot] com
> >> Web: http://www.redhat.com
> >> RHT Global #: 82-72060
> >>
> >> _______________________________________________
> >> Devel mailing list
> >> Devel at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/devel
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
> > 
> 
> 
> --
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community collaboration.
> See how it works at redhat.com
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
> 
> 
>