tags:

views:

48

answers:

1

Diff tools usually works on a line basis. That works pretty well except if 2 peoples add a new methods at the end of a file that results in a conflict , even using a 3-way diff, because system think both people tried to add something different as the place , except in that case the exact location doesn't really matter.

So I was wondering if it would be possible to do a more intelligent diff, using eventually the abstract syntax tree (AST) of the language so the diff would "understand" that you added or moved a new method or variable (rather than just messing around with block of lines).

I understand that a such diff would be language specific, but as many languages shared the same structure I might not be to hard to do something usable for different languages.

The thing is if it's not a stupid idea, somebody should have done it already, so the question is a such tool already exists and if not why ?

(if it's not stupid , new and usefull , I might be interested in launching a such project)

update

I'm looking for a free one (and if possible Ruby)

+1  A: 

See our Smart Differencer family of tools, one per langauge. They compare ASTs and identify changes in terms of programming structures (e.g., identifiers, expressions, statements, blocks, methods, ...) and actions (insert, delete, replace, move, copy, rename). By operating on trees, these tools completely ignore layout.

They do some language-semantic checks (e.g., rename-consistently-within-scope) and some moved-declaration-uninterestingly; we'll add more of these over time.

That bit about languages being "similar" so it is easy to build a family of tools is completely wrong. Either you want an accurate grammar (not similar) or you don't (bad diff answers). Then there's the problem of actually parsing the langauges; building robust parsers for real languages is hard individually and you need an army of them. We have special infrastructure built to handle this, and we've spent 15 years building a set of robust language definitions including the various dialects for individual lanugages.

Ira Baxter
Thanks, that's pretty much what I m looking for ... except I want a free one (and for Ruby) (I'll update my question)
mb14
The underlying machinery used to generate these tools has a stable of language definitions. Ruby is in the works, but we're not in a rush.
Ira Baxter
P.S.: The folk wisdom is the Ruby is difficult to parse. (That doesn't bother us as much as it might others; C++ is hard to parse too and we have that).
Ira Baxter
I understand languages are different to parse. However, I am more thinking of a simple structure approximation without necessarily semantic. Maybe a simple "nested blocks" structure without any semantic would be enough to do better than the current "lines-split" model (which kind of work already for every languages). I understand why it would be better with semantic an a proper parsing and I m not trying to compete with your product but with the standard diff.
mb14
You certainly can define a "weak" grammer which captures just nesting of "{...}" and treats everything else as text, and build a matcher against that. If you go that far, you'll lose the ability to handle format changes, and you'll find that you have to lex the text strings and comment perfectly or you'll get spurious brackets from inside those. If you don't go for pure text, you'll have decide what psuedo-tokens you will pick up; + is easy; but how about that regex token? The language details tend to get important fast if you want any detail.
Ira Baxter