views:

835

answers:

5

I got to thinking about how annoying messy merges can be and was wondering if there are any language aware diffing tools that go beyond the typical language-aware features of comment skipping and case sensitivity. Seems like the diffing engine could infer a lot more if it were aware of the syntax of what it is diffing instead of just treating it like text (essentially doing a diff on the abstract syntax tree).

+2  A: 

Eclipse (the popular Java-based IDE) has a diff that is language-aware, at least for Java. It shows you the top-level changes in a separate view above the normal line-based diff, like method added/removed etc., so I guess they do a tree diff on the AST. Couldn't find the source code for it easily, though.

You get to that diff when you compare two Java files (Compare With -> Each other) or when you have SVN support (Subclipse) installed and do any svn diff (Compare With -> ...).

Alexander Klimetschek
How does one invoke this feature in eclipse?
Cheekysoft
A: 

I don't know that I would want a language oriented tool doing anything for me other than ignoring white space. If you want a simple but powerful diff tool that can run across an entire source tree, I am very fond of ExamDiff which can be configured to ignore a wide variety of white space and other differences that don't actually affect the code.

I understand why you are asking for this, but if you get it I am afraid you will find it bites you later on when team members are ignoring changes when committing code or assuming they are benign when they are actually committing the next nominee for "most obfuscated code module" into your version control system.

Joe Skora
I would take weirdly obfuscated spacing schemes if it meant I had the ability to change the order of methods in a class definition and not have my merging tool go nuts. Besides, at least in our environment code reviews are performed on actual code -- not diffs.
Luke
@Luke: See the SmartDifferencer tool in one of the answers.
Ira Baxter
+4  A: 

Try BeyondCompare from Scooter Software: http://www.scootersoftware.com. I swear by it. I use version 2 at present, but I'm upgrading soon. The new version cites "grammar based comparison rules" as a feature: sounds like what you want.

Before I had it, I also used to do things you might've already tried: run it all through something like artistic-style or another beautifier and text-diff the canonicalized test.

BC understands at least something like 15 or 20 languages and has been very good to me. I've used it over ftp, I've told it to ignore CVS directories in a CVS working directory, and I've used it (to good effect) to synch over a samba share. It also has plugins to compare mp3s based on meta-data and has a nice image differencer.

It's also very cheap, and has about the kindest upgrade pricing I've ever seen (unless it's changed, you get the advantage of the volume discount for the number of licenses you'll have after the upgrade!).

I've tried it nominally under WINE on a linux box... the basics seemed OK. I didn't do any stress testing. That was version 2: version 3 also has a linux version.

Thomas Kammeyer
Grammar based comparison sounds interesting, but I couldn't find the information about it at their site. Where is it?
Luke
+2  A: 

You want a structured diff viewer. There is one for C# and Delphi at http://www.modelmakertools.com/structured-diff-viewer/index.html which may be of interest. It is smart about the language, and so can provide more informed differences.

mj2008
He might want a viewer, but for source code repositories I would actually want a tool that created standard patches while still being language-aware. There are multiple ways to represent the difference between two text files, and some make more sense from a language perspective than others.
MSalters
+1  A: 

See our http://www.semdesigns.com/Products/SmartDifferencer/index.html for a tool that is parameterized by langauge grammar, and produces deltas in terms of language elements (identifiers, expressions, statements, blocks, methods, ...) inserted, deleted, moved, replaced, or has identifiers substituted across it consistently. This tool ignores whitespace reformatting (e.g., different linebreaks or layouts) and semantically indistinguishable values (e.g., it knows that 0x0F and 15 are the same value).

EDIT: Handles C#, Java, COBOL, ECMAScript now. EDIT: Handles C, C++, PHP now.

Ira Baxter