views:

181

answers:

4

What is the best tool to make complex (multi-line) regular expression file contents searches with good reporting capabilities?

I need to make a report over large Java/JSP code base and I have to make some charts afterward.

Eclipse is rather good at searches, but it does not provide good report of what is found. It just shows the tree of files, but I would like to see a table with columns corresponding to full match, each group, file name, file path, file date, may some version control information etc. Then I can transfer this table to Excel and make some graphs that I want.

Is there some generic file system search tool that has such capabilities? Or maybe there is some Eclispe plugin that can give better reports (note that I'm stuck on eclipse 3.1.2)?

+1  A: 

um... grep -r ?

Or ruby/perl/python, if you want to have more control over the final output; it sounds like what you're after would only be a few lines.

JasonTrue
Can grep do a multi-line searches i.e. equivalent to (?smi) flags?
Superfilin
pcregrep can do Perl-style multiline regexes, in -M mode.
JasonTrue
I verified pcregrep supports -Mi mode, but not sure if -Mis has any effect.
JasonTrue
Thanks for your answer Jason, but I think I will stick with Tim's proposal as PowerGrep seems to be what I need for fast searching. Writing a Perl script would take more time for me at the moment, though I would definitely do that if I was less constrained on time and didn't work over slow remote desktop connection :).
Superfilin
+1  A: 

PowerGREP (on Windows) can be used to do (most of) that. You can define the format of your search results quite freely. I haven't tried yet to also add file meta information to the search results, but that should work. Not sure if you can add version control information (where would that come from?) - perhaps if you could be a bit more specific, I could check.

Other than that, why not write a small Python/Ruby/Perl script like JasonTrue suggested?

Tim Pietzcker
+2  A: 

Agent Ransack, TextPad, and UltraEdit allow you to perform regular expression searches against the file system. My favorite is Agent Ransack as you can specify regular expressions for the file names and for the content.

Mayo
+2  A: 

For searches over code bases with queries that understand the language structure, look at SD Search Engine. This tool indexes larges source base to provide very fast query response.

Queries are stated in terms of langauge elements (identifiers, operators, strings, ...) with constraints over the language elements (including wildcards and regexps on identifiers, strings and comments, as well as range constraints on numbers). Language whitespace and linebreaks (and comments unless you insist) are ignored.

If you want to do a plain regexp search on file character content, you can do that too but you don't get the speed advantage of the index, runs more like regular grep.

The interactive query result is shown in a hit window with other hits; by clicking, you can go to window containg the full source code of a hit.

In logging mode, all hits found are written to a log file with N lines of context, where you configure N. That's probably the report you want.

Ira Baxter