Re: [Engine-devel] Proposal for new commit msg design for engine commits

23 Aug 2013

      On Fri 23 Aug 2013 12:19:01 PM CEST, Alon Bar-Lev wrote:
...
----- Original Message -----
...
From: "David Caro Estevez" <dcaroest@redhat.com>
To: "Alon Bar-Lev" <alonbl@redhat.com>
Cc: "Eyal Edri" <eedri@redhat.com>, "engine-devel" <engine-devel@ovirt.org>, "infra" <infra@ovirt.org>
Sent: Friday, August 23, 2013 1:00:38 PM
Subject: Re: [Engine-devel] Proposal for new commit msg design for engine commits
----- Original Message -----
...
From: "Alon Bar-Lev" <alonbl@redhat.com>
To: "David Caro" <dcaroest@redhat.com>
Cc: "Eyal Edri" <eedri@redhat.com>, "engine-devel"
<engine-devel@ovirt.org>, "infra" <infra@ovirt.org>
Sent: Friday, August 23, 2013 10:45:37 AM
Subject: Re: [Engine-devel] Proposal for new commit msg design for engine
commits
----- Original Message -----
...
From: "David Caro" <dcaroest@redhat.com>
To: "Alon Bar-Lev" <alonbl@redhat.com>
Cc: "Eyal Edri" <eedri@redhat.com>, "engine-devel"
<engine-devel@ovirt.org>, "infra" <infra@ovirt.org>
Sent: Friday, August 23, 2013 11:16:31 AM
Subject: Re: [Engine-devel] Proposal for new commit msg design for engine
commits
On 07/20/2013 08:53 PM, Alon Bar-Lev wrote:
...
----- Original Message -----
...
From: "Eyal Edri" <eedri@redhat.com>
To: "Alon Bar-Lev" <alonbl@redhat.com>
Cc: "infra" <infra@ovirt.org>, "engine-devel"
<engine-devel@ovirt.org>,
"Fabian Deutsch" <fabiand@redhat.com>
Sent: Saturday, July 20, 2013 9:41:56 PM
Subject: Re: [Engine-devel] Proposal for new commit msg design for
engine
commits
This change to commit template has nothing to do with CI.
it's a change that should reflect updated components relevance to the
commit
code.
This commit template is mostly invalid, as touching more than one
'subsystem' is possible, and has not enough granularity.
I suggest using a tag at the end with some extra syntax, like:
Components: core, storage, db
Components: all
Components: all, !core, !db
...
For example, database change should trigger what?
All the jobs that are tagged for that component (db upgrades I suppose).
And if the changes affect storage components then also storage, if the
changes affect others then those others too.
...
Infra change should trigger what?
The same, all the jobs that are tagged as infra.
...
A change of both user interface and command should trigger what?
All the jobs tagged by user interface and/or command.
...
So you end up with:
userportal: storage: core: db: some message
As I suggested before, I think it's better if you end with a commit
message like:
Some message
Components: userportal, storage, core, db
Actually it can be easily done with a tag in the gerrit comment instead
of the commit message.
...
Just to make who happy?
The developer, the qe, the ci and the infra people. This mechanism is to
avoid running all the tests all the time. Of course there are some times
when all the tests should be run to make sure nothing else changed, but
most times you just need to run part of them to make sure you did not
break something obvious.
...
And maybe there are 50 tests of network, and you need only 5 of them
for
the specific change, how do you mark it, so now a developer need to
know
any test? what if you add one tomorrow which is relevant to a similar
change? how do you inform the developer that now he needs 6?
As I said before, what the developer specifies is not a list of tests,
but a list of components, that qe has to map to different sets of tests
that can change with time. So specifying webadmin will run all the tests
in that group, that might be only one, or 100, and might be
increasing/decreasing with time transparently for the developer. Adding
a new component is not common and there's no need to do it so frequently.
...
Why should it be the developer responsibility and not the quality
ensuring
engineer responsibility to determine which tests should run and when?
Of course it's the responsibility of the qe engineer to determine when
and which tests should be run. But this is meant to be a new tool for
the developer not a substitute for the full qe tests, so the developer
can easily make sure that he's changes do not break anything obvious
before starting the real tests (that will take more time and resources).
The developer just adds some metadata so the qe engineer can decide
which tests to run per patch, so it's on qe's hand in the end to decide
if ignore or not the metadata and which tests to run.
...
As far as this template was not actually used for anything but humans,
it
was not that important, but once you formalize it as an interface, I
step
forward and state that the subject line is not the right tool for the
task
at hand (or any for this matter).
I agree with that, I think that it should be a tag similar to Change-Id,
at the end of the commit message.
...
The fact that you have in each commit are the sources that are
modified,
all the other data is just plain noise. From the sources that are
modified
you should be able to derive a test plan with high chance that this
test
program will be correct. Human intervention should be possible by
ordering
special tests that are outside of the standard policy, for cases in
which
the standard policy of deriving tests from sources is too narrow.
That's just not true. The sources are complicated enough to make two
changes in the same file to affect different components. Any reused code
is prone to affect multiple components, making it really hard to
determine by which changed files which tests to run. And if you go down
to the function/class level it's even harder to decide and to maintain.
And of course it's not human error free, as the metadata in the
files/directories is defined and maintained by a human. And in my
opinion is a lot harder to implement and maintain, and a lot less agile,
and does not get rid of the human factor.
Once again...
1. Commit messages are not the place to specify metadata to interact with
automation, it is a record for future reviewer.
Agree, that's why I suggested triggering it from a comment message. But that
will require the developer another step after pushing.
...
2. The metadata within sources are the mean to automate the list of tests
related to a specific source without human interaction on each commit.
The drawbacks are:
 - It needs a lot of maintenance
One time maintenance compared to per patch.
Easier to enforce project policy, than relay on developer's policy.
It's one time big implementation and continued maintenance, each time a 
new file is added, or moved it has to be changed, or when a file is 
used in more than one place for the first time ...
...
...
- It require very modular code and
Right, we require this anyway.
I agree, but the ETA of that is >> ETA of any other option
...
...
- Locks the developer on which tests he want to run
As I wrote, it locks nothing, it is the baseline.
You can't avoid executing those tests, and if you want something else 
you have to do more than just push, that is not an advantage in front 
of any other solution.
...
...
...
3. If there is doubt from list of tests run them all, this is simple rule
for
automation.
Of course, but if there is too much doubt all that maintenance is useless.
So you suggest we run partial?
Yes, I though that is what this thread was for, to discuss the 
possibility of running partial tests on patches so we do not overload 
the infra while getting some feedback before running the whole set of 
tests.
...
based on whose decision?
On the developers decision
...
can we trust him?
Enough to run some tests on his patch proposal, yes. If we do not trust 
any developer we become closed to newcomers, and that's against any 
open source principle (and red hat's too) in my opinion.
...
...
...
4. The metadata within files are not the only way to order tests, one can
do
this manually via jenkins or any other mean as one can now.
As with any other solution.
Right, because of that I do not think this is emergency.
Just have jenkins jobs and people can order tests based on gerrit urls...
The problem is that people can not control which tests get triggered, 
only which patch they run on. If you trigger from a gerrit url, all the 
tests that can be triggered will (for that repo, meaning 
engine/vdsm/scheduler...).
...
...
...
5. The metadata within files will help us to achieve other targets, such as
automatic CC maintainers, verify that +2/-2 are set by authorized
maintainer
of component etc...
That's true, but I think that maybe putting metadata in each file is
overkilling.
One file in the root of the repo with a few lines should be enough for that
afaik.
The metadata model I suggested was within each source and within directory.
The collection is metadata of source + recursive meta data of directories.
At file put signature.
At directory put a file, such as .ovirt.metadata with same fields.
I think that will lead to forgetting to update some sources as you are 
not sure where the metadata is, for example, a lot of people will 
forget to update file /a/b/c.txt and only update /a/.ovirt.metadata as 
it is not obvious where the metadata is held (I know, you can just 
execute a couple of command to find out, but in my experience people is 
lazy, and things that depend on people not being lazy get forgotten).
Having only one place for the metadata helps prevent that though.
...
...
...
6. Gerrit 2.6 supports labels[1] which are actually what you are trying to
achieve using the commit message because of lack of other solutions.
They are not fit for that, at least yet.
So maybe better if we can help gerrit to improve[1]?
[1] http://code.google.com/p/gerrit/issues/detail?id=2085
Thanks! Maybe we can spend some effort to make that happen, anyone with 
java skills and willing to help reading this thread?
...
...
...
Until we have metadata within source, and we don't as we void discuss this
for long time, and try to find manual workarounds and solutions.
Or until we have tags in the messages, for the same reasons. Don't forget
that
metadata within source is also a 'manual workaround or solution', as someone
has
to maintain all the metadata in all the files/directories (maybe for that
purpose
will be better to have just one file with all that metadata, depending on
which
level of granularity we need to map all the files).
I think best is to spread metadata within the entire tree, as then maintainer of file control the metadata, if file is copied/duplicated/moved/renamed it keeps the metadata.
I can see your point, but that requires a lot of maintenance as I see 
it...
...
...
...
Add label for each test, this will allow ordering tests via the gerrit web
interface post submit.
Adding a comment will too.
So why relay on commit message, you can just ask people to comment:
OVIRT_ORDER_TEST: test1
Agree, the commit message tag was though to avoid having to go to 
gerrit page and create a new comment for each patch you submit. That 
step has to be done manually right now. Maybe we can add a git hook on 
the client side that detects the tag in the commit message, deletes it 
and posts it as a separated comment on gerrit?
...
...
...
After people start to do this over and over and over and over they will
appreciate the need to add metadata into the source tree.
I think that having to maintain metadata in all the files is way more
annoying,
but yes, as you say adding metadata to each file gives a lot more
information,
but only if it's up to date.
Just like claiming that the source is out of date.
If there is a mistake in metadata the maintainer will detect that when metadata is used.
I'm sorry, but I think that most of our code is out to date :s, meaning 
it's not what it should be (not modular enough, duplicated code, legacy 
code... I haven't heard of a developer sayiong, wow! this code is 
really good and well maintained, it's usually the other way around), 
and because of that I think that the metadata will have similar 
treatment, changing that is not an easy and fast task. Though I really 
believe that we will improve it. Maybe any other developers can supply 
their views on this?
...
...
...
In future, if this works out we can help gerrit to improve by enhancing the
labels into free text/combo box etc...
Or maybe try to do this now if you like to help out gerrit to improve...
That will make labels fit our purpose, and avoid having to add tags, metadata
and all that, but it's not done yet. I'll gladly add that feature, but I'm
not
familiar with java or gerrit code, so I might not be the best person to do
it,
I'm sure that it will take a lot more time for me than the other options
(Eyal,
any comment?). If you are fluent with java and you want to help you are very
welcome :)
You can help in making the metadata into the tree.
This will be a great step forward.
Of course, if that's what gets decided I will happily help as much as 
needed.
...
Thanks,
Alon
So to get this closed and start working in a solution I've created an 
etherpad:
http://etherpad.ovirt.org/p/commit_message_design

Please anyone with opinion add you comments and votes, maybe we can 
close this next week and start working! :)

Should we add vdsm also to the thread? They also want a solution like 
this for their tests.

ps. you can add another proposals if you have them, of course.

--
David Caro

Red Hat Czech s.r.o.
Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605
Email: dcaro@redhat.com
Web: www.cz.redhat.com
Red Hat Czech s.r.o., Purkyňova 99/71, 612 45, Brno, Czech Republic
RHT Global #: 82-62605