views:

1736

answers:

7

I'm trying to find some good examples of semantic diff/merge utilities. The traditional paradigm of comparing source code files works by comparing lines and characters.. but are there any utilities out there (for any language) that actually consider the structure of code when comparing files?

For example, existing diff programs will report "difference found at character 2 of line 125. File x contains v-o-i-d, where file y contains b-o-o-l". A specialized tool should be able to report "Return type of method doSomething() changed from void to bool".

I would argue that this type of semantic information is actually what the user is looking for when comparing code, and should be the goal of next-generation progamming tools. Are there any examples of this in available tools?

+1  A: 

Don't know how any tool that cute but would love to hear from it !

Looks like you want a tool that compares the parse tree rather than the source itself....

A first step could be to use a pretty printer before diff, eliminating any difference that affect only presentation and not meaning.

A better one would be to have a representation of the parsed code and diff from there... Anyone knows of good diff algorithm for trees ?

siukurnin
(diff algo for trees) Heck that's a dissertation topic in itself.
jim
Eclipse is able to do that. Check my answer.
Hosam Aly
Is there a "pretty printer" wrapped into any of the text diff tools available? This seems like a much more trivial thing to create considering how many false positives it would save.
Andrew Hubbs
A: 

First off, I havent heard of any tool that works like this.

But as pointed out, it seems that youre aiming to compare the source tree rather than the text itself, as diff tools usually does. Maybe the first step is to check how different code completion tools build their trees, such as Visual Assist, and compare them.

Take the base copy and the working copy, build a tree from each and compare the trees.

mizipzor
+2  A: 

The solution to this would be on a per language basis. I.e. unless it's designed with a plugin architecture that defers a lot of the parsing of the code into a tree and the semantic comparison to a language specific plugin then it will be very difficult to support multiple languages. What language(s) are you interested in having such a tool for. Personally I'd love one for C#.

For C# there is an assembly diff add-in to Reflector but it only does a diff on the IL not the C#.

You can download the diff add-in here [zip] or go to the project on the codeplex site here.

Jonathan Parker
See http://www.semdesigns.com/Products/SmartDifferencer/index.html for a syntax tree-based comparison engine that works with many languages, using exactly the language plugin style. Not released yet, but a C# version is very close.
Ira Baxter
Jan 2010: C# Smart Differencer is released.
Ira Baxter
+4  A: 

What you're groping for is a "tree diff". It turns out that this is much harder to do well than a simple line-oriented textual diff, which is really just the comparison of two flat sequences.

"A Fine-Grained XML Structural Comparison Approach" concludes, in part with:

Our theoretical study as well as our experimental evaluation showed that the proposed method yields improved structural similarity results with respect to existing alternatives, while having the same time complexity (O(N^2))

(emphasis mine)

Indeed, if you're looking for more examples of tree differencing I suggest focusing on XML since that's been driving practical developments in that area.

bendin
Thanks for the link. I can think of a few different approaches for implementing sematic diff tools, and you are correct -- most can be abstracted into a "tree diff". More complex situations may even need to be abstracted into a "graph diff".
Yea. IBM's Rational Modeler (built on eclipse) tries to do this with UML models (showing the differences between two models graphically). I can't comment on the usefulness of the results as I don't use it much.
bendin
I agree that XML is a good place to start, as you can simply come up with schemas to represent other structures (such as java code, for example), and use an XML based tree-diff to implement a code diff.
"do this" => do something akin to a "graph diff".
bendin
See http://www.semdesigns.com/Products/SmartDifferencer/index.html for a syntax tree-based comparison engine that works with many languages.
Ira Baxter
+10  A: 

Eclipse has had this feature for a long time. It's called "Structure Compare", and it's very nice. Here is a sample screenshot for Java, followed by another for an XML file:

(Note the minus and plus icons on methods in the upper pane.)

Eclipse's Java Structure Comparer

Eclipse's XML Structure Comparer

Hosam Aly
Does Structure Compare allow you to merge changes like other source control merge editors? I.e. Copy this method from this version to the other version.
Jonathan Parker
Yes, when you select a change or a difference (either in the upper or lower panes), the toolbar buttons (shown in the screenshots) give you the option to copy the change from left to right or vice versa.
Hosam Aly
+2  A: 

To do "semantic comparisons" well, you need to compare the syntax trees of the languages, and take into account the meaning of symbols. A really good semantic diff would understand the language semantics, and realize when one block of code was equivalent in function to another. Going this far requires a theorem prover, and while it would be extremely cute, isn't presently practical for a real tool.

A workable approximation of this is simply comparing syntax trees, and reporting changes in terms of structures inserted, deleted, moved, or changed. Getting somewhat closer to a "semantic comparison", one could report when an identifier is changed consistently across a block of code.

See http://www.semanticdesigns.com/Products/SmartDifferencer/index.html for a syntax tree-based comparison engine that works with many languages, that does the above approximation.

EDIT Jan 2010: Versions available for C++, C#, Java, PHP, and COBOL. The website shows specific examples for most of these.

EDIT May 2010: Python and JavaScript added.

EDIT Oct 2010: EGL added.

Ira Baxter
A: 

A company called Zynamics offers a binary-level semantic diff tool. It uses a meta-assembly language called REIL to perform graph-theoretic analysis of 2 versions of a binary, and produces a color-coded graph to illustrate differences between them. I am not sure of the price, but I doubt it is free.

David V McKay