views:

420

answers:

4

I have some old software (in a language that's not dead but is dead to me ;-)) that implements a basic pattern-matching and -rewriting system for source code. I am considering resurrecting this code, translating it into a modern language, and open-sourcing the project as a refactoring power-tool. Before I go much further, I want to know if anything like this exists already (my google-fu is fanning air on this tonight).

Here's how it works:

  • the pattern-matching part matches source-code patterns spanning multiple lines of code using a template with binding variables,
  • the pattern-rewriting part uses a template to rewrite the matched code, inserting the contents of the bound variables from the matching template
  • matching and rewriting templates are associated (1:1) by a simple (unconditional) rewrite rule

the software operates on the abstract syntax tree (AST) of the input application, and outputs a modified AST which can then be regenerated into new source code

for example, suppose we find a bunch of while-loops that really should be for-loops. The following template will match the while-loop pattern:

Template oldLoopPtrn
 int @cnt@ = 0;
 while (@cnt@ < @max@)
 {
  … @body@
  ++@cnt@;
 }
End_Template

while the following template will specify the output rewrite pattern:

Template newLoopPtrn
 for(int @cnt@ = 0; @cnt@ < @max@; @cnt@++)
 {
  @body@
 }
End_Template

and a simple rule to associate them

Rule oldLoopPtrn --> newLoopPtrn

so code that looks like this

int i=0;
while(i<arrlen)
{
    printf("element %d: %f\n",i,arr[i]);
    ++i;
}

gets automatically rewritten to look like this

for(int i = 0; i < arrlen; i++)
{
    printf("element %d: %f\n",i,arr[i]);
}

The closest thing I've seen like this is some of the code-refactoring tools, but they seem to be geared towards interactive rewriting of selected snippets, not wholesale automated changes.

I believe that this kind of tool could supercharge refactoring, and would work on multiple languages (even HTML/CSS). I also believe that converting and polishing the code base would be a huge project that I simply cannot do alone in any reasonable amount of time.

So, anything like this out there already? If not, any obvious features (besides rewrite-rule conditions) to consider?

EDIT: The one feature of this system that I like very much is that the template patterns are fairly obvious and easy to read because they're written in the same language as the target source code, not in some esoteric mutated regex/BNF format.

+1  A: 

TXL is rule based rather than pattern based, so it has more capabilities but perhaps a steeper learning curve.

Doug Currie
thanks for the link, that looks really interesting...but far from intuitive! ;-)
Steven A. Lowe
Rules in program transformation systems tend to just be *pairs* of patterns, the first one for a match, and the second one for regeneration, very much like what Lowe proposed. The idea isn't new. What matters is a robust implementation of the tool, *and* the ability to apply it to real source codes. TXL (and Stratego/XT) can do this but isn't really good at handle symbol-table type information (although one can fake it). DMS (mentioned in another answer) does this, but also provides/uses symbol table data as well as control/data flows, points-to analysis, and call graphs.
Ira Baxter
+2  A: 

I am considering resurrecting this code, translating it into a modern language, and open-sourcing the project as a refactoring power-tool.

I think it would be simply great to have such a tool freely available.

But there is a commercial product available: DMS Software Reengineering Toolkit.

The DMS Software Reengineering Toolkit is a set of tools for automating customized source program analysis, modification or translation or generation of software systems, containing arbitrary mixtures of languages ("domains"). The term "software" for DMS is very broad and covers any formal notation, including programming languages, markup languages, hardware description languages, design notations, data descriptions, etc. This toolkit is the first step towards the implementation of The Design Maintenance System™, an ambitious vision of a 21st Century software engineering environment that supports the incremental construction and maintenance of large application systems, driven by semantics and captured designs.

Also, for C source code there is the coccinelle tool:

Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. Beyond collateral evolutions, Coccinelle is successfully used (by us and others) for finding and fixing bugs in systems code.

none
A: 

A Coccinelle implementation of the above rule would be:

@@
identifier cnt;
expression max,E;
@@

cnt = 0
... when != cnt=E
-while (cnt < max)
+for (cnt=0; cnt < max; cnt++)
{
  ...
-  cnt++;
}

For the C code:

int main () {
  int i=0;
  printf ("hello\n");
  while(i < arrlen) {
    printf("element %d: %f\n",i,arr[i]);
    ++i;
  }
}

it gives

int main () { int i=0; printf ("hello\n"); for (i = 0; i < arrlen; i++) { printf("element %d: %f\n",i,arr[i]); } }

The ... when != cnt=E allows arbitrary code between the initialization of cnt and the while loop, but checks that cnt is not redefined. A more elaborate rule could also get rid of the initialization of cnt, if it is not used between the initialization and the while loop.

julia
+1  A: 

Our DMS has already been mentioned. It has transformation rules that are, as the OP put it, "easy to read because they're written in the same language as the target source code".

Here's a link that shows a complete, detailed example of pattern-matching/transformation using DMS.

Ira Baxter