ansaurus

Question

Identifying frequent formulas in a codebase

Answer 1

A:

You might want to look into tag-cloud generators. I couldn't find any source in the minute that I spent looking, but here's an online one: http://tagcloud.oclc.org/tagcloud/TagCloudDemo which probably won't work since it uses spaces as delimiters.

Adam Shiemke 2010-07-01 19:13:24

Answer 2

A:

I would think you could use an existing full-text indexer like Lucene, and implement your own Analyzer and Tokenizer that is specific to your formula language.

You then would be able to run queries, and be able to see the most used formulas, which ones appear next to each other, etc.

Here's a quick article to get you started:

Lucene Analyzer, Tokenizer and TokenFilter

GalacticJello 2010-07-01 19:19:21

Answer 3

A:

The string matching is just the low hanging fruit, the obvious cases. The harder cases are where you're doing similar things but in different order. For example suppose you have:

X+Y
Y+X

Your string matching approach won't realize that those are effectively the same. If you want to go a bit deeper I think you need to parse the formulas into an AST and actually compare the AST's. If you did that you could see that the tree's are actually the same since the binary operator '+' is commutative.

You could also apply reduction rules so you could evaluate complex functions into simpler ones, for example:

(X * A) + ( X * B)
X * ( A + B )

Those are also the same! String matching won't help you there.

Parse into AST
Reduce and Optimize the functions
Compare the resulting AST to other ASTs

If you find a match then replace them with a call to a shared function.

justin.m.chase 2010-09-23 17:31:46

Also if you have existing functions like "Trim" you could get the AST of that function and see if it matches sub trees in the functions you're evaluating.

justin.m.chase 2010-09-23 17:33:50

ansaurus

tags:

views:

answers:

Identifying frequent formulas in a codebase

related questions