I'm working on a project with a team where we check in early and often. Individual checkins are completed to the satisfaction of the developer doing the change (including tests where possible) but sometimes the direction of work changes slightly and previous commits need to be reverted and done another way. Or, stub code is filled out in later commits.

When it comes time for code review, there's a sequence of commits all tagged with the same bug tracking id number. It's easy to get a list of such changes. When a reviewer looks through the changes one by one, sometimes there will be a commit A that is undone or modified by a later commit B as part of the same review. This can make reviewing more difficult.

If only one developer was working on the file for the duration of the change, then it's easy to do a diff between the original state of the file and the final state of the file. The problem arises when another developer happens to have made unrelated changes in the same file, or even in the same functions.

How do you handle this situation? Are there tools that, given a sequence of patches to a file, can give the moral equivalent of a diff between the first and last versions, but only including a subset of those patches?

Come to think of it, I could create a temporary branch in git starting from before the first related change, and cherry-pick the changes relevant to the review. Hopefully there won't be too many conflicts that need to be resolved (and if there are, then the whole lot should be reviewed at once anyway). Any other ideas?

More info: This happens to be a big legacy system where a single change might touch multiple files. The files are big and crufty - too big to just review the final product without an indication of what might have changed.

+6  A: 

I think you should be branching to do your changes. See my post here.

Brian R. Bondy
We have chosen to commit incremental development to the trunk in order to avoid the "big bang" problem where a developer takes weeks or more to develop a solution, then drops the whole thing in at once. Of course this would be after review, but doing frequent commits helps with integration testing.
Greg Hewgill
what do you mean by drops the whole thing at once?
Brian R. Bondy
When a developer is working on a work item on a branch, they are isolated from the rest of the team. It may be perfect work on the branch, but when it's merged into the trunk there is the potential for integration problems. Frequent commits to trunk help identify such problems earlier.
Greg Hewgill
from experience integration problems are minimal
Brian R. Bondy
Frequent commits with branches are also possible. Get the developers to branch off the trunk for every chunk of work they do and then merge the changes back to trunk. The 'big bang' problem is a development/project management issue, not a branching issue as such.
If such integration problems appear often then you should think about redesigning the architecture of your application.
Piotr Jakubowski
@Piotr: We don't have the luxury of redesigning the architecture, unfortunately. Integration problems are a consequence of the architecture we're stuck with.
Greg Hewgill
If you're using git then there's no excuse for developers on branches to not be merging changes from the trunk into their branch all the time. Especially if you only merge completed work into the trunk...
Michael Anderson

Uh, just look at the last state of the file.

(Branching for each change is a PITA. Don't do it.)

Unfortunately, we're working on a legacy system with huge crufty source files that are thousands of lines long. It's not feasible to just look at the current state of the file and discern what happened - chances are good that the reviewer hasn't seen the rest of the file anyway. Diffing is required.
Greg Hewgill
+2  A: 

It turns out you can do this fairly nicely in IntelliJ IDEA if you've got that tool:

Select Version Control | Show Changes View.

On the left hand side you select repository and click all of the revisions you want to review.

In the right hand pane you will get a list of all the files that are affected by the revisions you have selected. When you choose "diff" you will see the internal changes in the selected changesets. Internal re-works that occur within the commits are not shown (as can be expected)

I installed the latest IntelliJ IDEA (Community Edition 9.0.1) and this looks like a really good feature. It *almost* does everything we want for reviews. Selecting multiple nonadjacent commits on the left pane does indeed show only the files affected by those changes on the right hand pane (that's good). However, when viewing the associated diff for an individual file, it appears as though the diff shown is the total difference between the first selected and last selected commit. Therefore this includes changes from intervening commits that happened to touch the same file(s).
Greg Hewgill
I no longer use subversion, but I think it at least USED to be possible to get the diff without the intertwined other history. But then again (even with git) i'm still not entirely sure of the semantic value of that view. It's almost as if one is pretending code lives in a vacuum. But that's a different discussion

See our for a tool that is parametrized by language grammar, and produces deltas in terms of language elements (identifiers, expressions, statements, blocks, methods, ...) inserted, deleted, moved, replaced, or has identifiers substituted across it consistently. This tool ignores whitespace reformatting (e.g., different line breaks or layouts) and semantically indistinguishable values (e.g., it knows that 0x0F and 15 are the same value).

This won't disentangle two sets of changes made for fundamentally different reasons. But it will minimize the size of the deltas you have to inspect, and that will surely help.

You say the files are thousands of lines long. This sounds like COBOL. There's a smart differencer tool for C#, PHP, Java, C++ ... and COBOL, with examples for most of these at the web site.

Ira Baxter
"You say the files are thousands of lines long. This sounds like COBOL." What? You have never seen 3000-line long .cpp files?
Sure I have, I've even written a few. And in fact everybody has seen huge files written in every langauge ever invented (maybe not for APL but then...) However, the OP said "large legacy system" and implied *all* of his files were pretty big, so I guessed. Not a lot of harm done if I've guessed wrong, but I just wanted to make sure the OP understood there were solutions for COBOL too.
Ira Baxter
The system in question is actually written in Ada. Unfortunately, the Smart Differencer you cited doesn't appear to support that. We already try to minimise textual changes to reduce conflicts because there's a branch for each customer, with changes regularly ported between branches (by others, not by our team). So minimal textual changes are greatly appreciated by upstream CM specialists.
Greg Hewgill
Fortunately, the Smart Differencer is built using a generalized scheme, and is parameterized by language definitions. We have Ada83 and Ada95 language definitions! If you'd like to try an Ada Smart Differencer, I'll put one together for you.
Ira Baxter
... did put one together for Ada95.
Ira Baxter
+9  A: 

One approach (using subversion terminology, but similar approaches should work with other systems, too):

Check out a working copy at a revision just before the first changes. Then merge all related commits into your working copy.

Now you have a working copy which differs from its base just by the relevant changes. You can review this directly, or create a patch from it for review.

If some unrelated change after the base revision overlaps with the to-reviewed changes, you might get a merge conflict, but this should be rare.

Thanks, this seems like the most solid approach for our situation.
Greg Hewgill
+1  A: 

From a Subversion point of view

Given your requirements i would run code reviews by diffing against a tag from the last review; i.e. diff between two tags.

+1  A: 

If you have the ability to switch source control systems, you might seriously consider git (or mercurial or other similar tool).

Each developer creates their own local repository.

For each bug they work on, they create a branch.

When they're done with the changes, they can play with their history using rebase commands to bundle up changes into coherent checkpoints.

The developer then submits the rebased changeset for review.

The integrator or reviewer can then accept the changes as-is, cherry-pick to grab what they want to keep, and/or do more rebasing to break up commits that shouldn't have been grouped together.

Mr Fooz
is this methodology described in more detail somewhere? I'd like to learn more.
Thorbjørn Ravn Andersen

The patchutils package contains a combinediff tool that can "create a cumulative unified patch from two incremental patches".

+1  A: 

I would do pretty much what you suggested. Create a review branch. Get the developer(s) whos changes are being reviewed to do the cherry-picking into that branch. Diff the root of the branch with the head.

I'd probably also create a little tool to grab all the checkins that match a given bug tracking id tag and mash them together into an approximate script to do the cherry picking. I'd definately give the user a chance to edit this script before running it though as I'll bet some stuff gets miss-tagged.

Do the review on the branch. Make any edits on the branch. Cherry pick the edits back to the trunk. Discard the review branch if you don't want it anymore.

Michael Anderson