[ovirt-devel] gerrit+ci improvement proposal

Sandro Bonazzola sbonazzo at redhat.com
Thu Jun 4 06:50:56 UTC 2015


Il 04/06/2015 08:46, Eyal Edri ha scritto:
> 
> 
> ----- Original Message -----
>> From: "Sandro Bonazzola" <sbonazzo at redhat.com>
>> To: "Eyal Edri" <eedri at redhat.com>, "Max Kovgan" <mkovgan at redhat.com>
>> Cc: devel at ovirt.org, infra at ovirt.org
>> Sent: Thursday, June 4, 2015 9:11:10 AM
>> Subject: Re: [ovirt-devel] gerrit+ci improvement proposal
>>
>> Il 03/06/2015 21:46, Eyal Edri ha scritto:
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Max Kovgan" <mkovgan at redhat.com>
>>>> To: devel at ovirt.org
>>>> Cc: infra at ovirt.org
>>>> Sent: Wednesday, June 3, 2015 8:22:54 PM
>>>> Subject: [ovirt-devel] gerrit+ci improvement proposal
>>>>
>>>> Hi everyone!
>>>> We really want to have reliable and snappy CI: to allow short cycles and
>>>> encourage developers to write tests.
>>>>
>>>> # Problem
>>>>
>>>> Many patches are neither ready for review nor for CI upon submission,
>>>> which
>>>> is OK.
>>>> But running all the jobs on those patches with limited resources results
>>>> in:
>>>> overloaded resources, slow response time, unhappy developers.
>>>>
>>>> # Proposed Solution
>>>>
>>>> To run less jobs we know we don’t need to, thus making more resources for
>>>> the
>>>> jobs we need to run.
>>>> We have been experimenting to make our CI stabler and quicker to respond
>>>> by
>>>> using gerrit flags. This has improved in both directions very well
>>>> internally.
>>>> Now it seems a good time to let all the oVirt projects to use this.
>>>> This solution indirectly promotes reviews and quick tests - “to fail
>>>> early”,
>>>> yet full blown static code analysis and long tests to run “when ready”.
>>>>
>>>> # How it works
>>>>
>>>> 2 new gerrit independent flags are added to gerrit.
>>>>
>>>> ## CI flag
>>>>
>>>> Will express patch CI status. Values:
>>>>  * +1 CI passed
>>>>  *  0 CI did not run yet
>>>>  * -1 CI failed
>>>> Permissions for setting: project maintainers (for special cases) should be
>>>> able to set/override (except Jenkins).
>>>>
>>>> ## Workflow flag
>>>>
>>>> Will express patch “workflow” state. Values:
>>>>  *  0 Work In Progress
>>>>  * +1 Ready For Review
>>>>  * +2 Ready For Merge
>>>> Permissions for setting: Owner can set +1, Project Maintainers can set +2
>>>>
>>>> ## Review + CI Integration:
>>>>
>>>> Merging [“Submit” button to appear] will require: Review+1, CI+1,
>>>> Workflow+2
>>>> Patch lifecycle now is:
>>>> ---------------------------------------------------------------
>>>> patch state   |owner     |reviewer |maintainer |CI tests |pass
>>>> ---------------------------------------------------------------
>>>> added/updated |-         |-        |-          |quick    |CI+1
>>>> review        |Workflow+1|Review+1 |-          |heavy	 |CI+1
>>>> merge ready   |-         |-        |Workflow+2 |gating   |CI+1
>>>> merge         |-         |-        |merge      |merge    |CI+1
>>>>
>>>> Changes from current workflow:
>>>> Owner only adds reviewers, now owner needs to set "Workflow+1" for the
>>>> patch
>>>> to be reviewed, and heavily auto-tested.
>>>> Maintainer now needs to set "Workflow+2" and wait for "Submit" button to
>>>> appear after CI has completed running gating tests.
>>>>
>>>>
>>>> Next step will be to automate merge the change after Workflow+2 has been
>>>> set
>>>> by the Maintainer and gating tests passed.
>>>>
>>>>
>>>> ## Why now?
>>>>
>>>> It is elimination of waste. The sooner - the better.
>>>> The solution has been used for a while and it works.
>>>> Resolving the problem without gerrit involved will lead to adding
>>>> unreliable
>>>> code into jobs, and will still be prone to problems:
>>>>   Just recently, 3d ago we’ve tried detecting what to run from jenkins
>>>>   relying only on gerrit comments so that upon Verified+1, we’d run the
>>>>   job.
>>>>   We could not use “Review+1”, because it makes no sense at all, so we
>>>>   left
>>>>   the job to set Verified+1.
>>>>   Meaning - re-trigger itself immediately more than 1 times.
>>>>   
>>>>   Jenkins and its visitors very unhappy, and we had to stop those jobs,
>>>>   clean
>>>>   up the queue, and spam developers.
>>>>
>>>> ## OK OK OK. Now what?
>>>>
>>>> Now we want your comments and opinions before pushing this further:
>>>> Please participate in this thread, so we can start trying it out.
>>>> Ask, Suggest better ideas, all this is welcome.
>>>>
>>>>
>>>> Best Regards!
>>>>
>>>>
>>>> N.B.
>>>> Of course, this is not written in stone, in case we find a better approach
>>>> on
>>>> solving those issues, we will change to it.
>>>> And we will keep improving so don't be afraid that it will be enforced: if
>>>> this does not work out we will discard it.
>>>>
>>>> P.S.
>>>> Kudos to dcaro, most of the work was done by him, and most of this text
>>>> too.
>>>>
>>>
>>> +1 from me, releasing CI from running non critical and un-essential jobs
>>> will not only reduce load from ci,
>>> and shorted response time for developers, it will allow us to add much more
>>> powerful tests such as functional & system
>>> tests that actually add hosts and run VMs, improving our ability to find
>>> regression much more effectively.
>>>
>>> Another benefit to consider is saving reviewers time. I.e not only jenkins
>>> benefits from Worklow+1, but also human reviewers.
>>> Instead of looking at a patch that is too early to be reviewed, the author
>>> can set the Workflow+1 when the code is ready to review
>>> (even if he didn't verified it yet), thus saving time to other reviewers -
>>> for example people can add an email rule
>>> to alert them only when they are added to patches that have Workflow+1, and
>>> not before.
>>
>> For human reviewers I suggest to keep using drafts until the patch is
>> finished.
> 
> keep using? how many developers do you know are working with drafts until
> their patch is ready? 

We have ~350 drafts in gerrit so someone is using them :-)
https://gerrit.ovirt.org/#/q/is:draft

> i agree if everyone would use drafts load on jenkins was already much lower,
> unfortunately its not the case.
> 
>> Once it's finished and humans reviewed the logic of the patch, Workflow+1
>> should be triggered allowing automation to check the correctness of the
>> patch.
>> IMHO there's no reason for wasting CI time on patches that will be correct
>> from an automation point of view but nacked by reviewers. Especially if the
>> patches are part of a big patchset.
>>
>>
>>>
>>> And one final note, for Workflow+2 -> this is a preparation for a gating
>>> system, like Zuul used by openstack, that in the future
>>> we might use as automatic merger pending passing a verification step. this
>>> will prevent errors that happen sometimes
>>> post merge due to conflicts or other issues, and will be another level of
>>> validation before final merge.
>>> But as max said, its all part of the plan and we'll test it of course
>>> before implementing to see its value.
>>>
>>>
>>>>
>>>>
>>>>
>>>> Max Kovgan
>>>>
>>>> Senior Software Engineer
>>>> Red Hat - EMEA ENG Virtualization R&D
>>>> Tel.: +972 9769 2060
>>>> Email: mkovgan [at] redhat [dot] com
>>>> Web: http://www.redhat.com
>>>> RHT Global #: 82-72060
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/devel
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>
>>
>>
>> --
>> Sandro Bonazzola
>> Better technology. Faster innovation. Powered by community collaboration.
>> See how it works at redhat.com
>> _______________________________________________
>> Infra mailing list
>> Infra at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/infra
>>
>>
>>


-- 
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com



More information about the Infra mailing list