views:

200

answers:

3

I'm trying to create an app to search my company's ColdFusion codebase. I'd like to be able to do intelligent searches, for example: find where a function is defined (and not hit everywhere the function is called). In order to do this, I'd need to parse the ColdFusion code to identify things like function declarations, function calls, database queries, etc.

I've looked into using lex and yacc, but I've never used them before and the learning curve seems very steep. I'm hoping there is something already out there that I could use. My other option is a mess of difficult-to-maintain regex-spaghetti code, which I want to avoid.

A: 

None existed. Since ColdFusion is more like scripts than code, I'd imagine it'll be hard to write a parser for it.

ColdFusion Builder can parse CFM/CFC to an outline in Eclipse. Maybe you can do some research on whether a CF Builder plugin can do what you want to do.

Henry
Being script-like doesn't mean it is hard to write a parser for it. Any langauge is represented by a set of strings. Parsers parse sets of strings described implicitly by the procedural code that comprises the parser, or explicitly by the grammar rules that drive the parser if so designed. Defining ColdFusion to a grammar-driven parser generator is more a matter of getting a good description of ColdFusion than anything else.
Ira Baxter
+1  A: 

Writing parsers for real langauges is usually difficult because they contain constructs that Lex and Yacc often don't handle well, e.g., the langauge isn't LALR(1). ColdFusion might be easier than some because of its XML-like style.

If you want to build a sophisticated parser quickly, you might consider using DMS Software Reengineering Toolkit which has GLR parsing support.

If you want to avoid writing your own or hacking all those Regexps, you could consider the Source Code Search Engine. It has language-sensitive parsers and can search across very large source code bases very quickly. One of its "language sensitive" parsers is AdhocText, which is designed to handle "generic" programming languages such as those you might find in a random programming book; it even understands XML-like tags such as ColdFusion has. You can download a evaluation version from the link provided to try it.

EDIT 4/3/2010: A recent feature added to the SCSE is the ability to tag definitions and uses separately. That would address the OP's desire to find the function definition rather than all the calls.

Ira Baxter
+2  A: 

I used the source to CFEclipse, since it is open source and has a parser. Not sure about the legality of this if we were selling/redistributing it, but we're only using it for an internal tool.

Kip