views:

60

answers:

1

I need a very specific tool for VB (or multi-language). I ask for it, before I start making one myself (probably, in python).

What I need:

  • The tool must crawl, recursivelly or not, a path, searching for a list of extension, such as .bas, .frm, .xxx
  • Then, It has to parse that files, searching for functions, routines, etc.
  • And finally, it must output what it found.

I base this on the idea of, "reducing code redundance", in an scenario where, bad programmers make a lot of functions that do the same, sometimes with the same name, sometimes not. There are 4 cases:

  • Case 1: Same name, Same content.
  • Case 2: Same name, Diff content.
  • Case 3: Diff name, Same content.
  • Case 4: Diff name, Diff Content.

So, the output, should be something like this

===========================================================================
RESULT
===========================================================================
Errors:
---------------------------------------------------------------------------
==Name, ==Content --> 3: (Func(), Foo(), Bar()) In files (f,f2,f3)
!=Name, ==Content --> 2: (Func() + Func1(), Bar() + Bar1()) In Files (f4)

---------------------------------------------------------------------------
Warnings:
==Name, !=Content --> 1 (Foobar()) In Files (f19)

---------------------------------------------------------------------------

This is to give you an idea of what I need.

So, the answer is: is there any tool that acomplish something similar to this???

P.S: Yes, we should wrote good code, in first instance, but, you know, stuff happens.

+2  A: 

What you want is a "clone detector". These tools find copy-and-pasted code across a large set of designated files. Clones are not just of functions; they can be code blocks, data declarations, etc.

There are a variety of detectors out there, and you should know how they work before you attempt to build one of your own.

Some simply match lines for exact equivalence. While these demonstrate the basic idea, their detection is not good because they don't take into account the fact that cloned code often has variations; what people really do is clone-and-edit when making copies.

Some match sequences of langauge tokens, e.g., identifiers, keywords, literals, punctuation. These at least are relatively tolerant of whitespace changes. And they can find clones in which single tokens have been substituted for single tokens. However, because they don't understand language structure (blocks, statements, function bodies) they often match sequences that cross such structure boundaries (e.g., "} {" is often considered a clone by these tools), they produce rather high false-positive indications of (non)clones. Some of these attempt to limit the matches to key program structures, such as complete functions, as you have kind of suggested.

More sophisticated detectors match program structures. Our CloneDR (I'm the original author) is a detector that uses compiler-quality parsing to abstract syntax trees, which extracts the precise structure of the code. It does this for many languages (including VB6 and VBScript), locating clones as arbitrary functions, blocks, statements or declarations, with parameters shows how the clones vary. CloneDR can find clones in spite of formatting changes, changes in comment locations or content, and even variations where complex constructs (multiple statements or expressions) have been used as alternatives to simple ones (e.g., a single statment or a literal). While it tends to have a high detection rate(it usually finds 10-20% removable redundancy!), its false-positive rate tends to be considerably lower than the token based detectors. You can see sample reports for a variety of different langauges at the link above.

See Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach which explicitly discusses different approaches and benefits, and compares a large number of detectors including CloneDR.

EDIT October 2010: ... When I first wrote this response, I assumed the OP was interested in VB.net, which CloneDR didn't do. We've since added VB.net, VB6 and VBScript capability to CloneDR. (Parsing VB.net in its modern form is a lot messier than one might imagine for "simple"(!) langauge like Visual Basic).

Ira Baxter