views:

126

answers:

4

When searching code for strings, I constantly run into the problem that I get meaningless, context-less results. For example, if a function call is split across 3 lines, and I search for the name of a parameter, I get the parameter on a line by itself and not the name of the function.

For example, in a file containing

...
  someFunctionCall ("test",
                    MY_CONSTANT,
                    (some *really) - long / expression);

grepping for MY_CONSTANT would return a line that looked like this:

                    MY_CONSTANT,

Likewise, in a comment block:

/////////////////////////////////////////
// FIXMESOON, do..while is the wrong choice here, because
// it makes the wrong thing happen
/////////////////////////////////////////

Grepping for FIXMESOON gives the very frustrating answer:

// FIXMESOON, do..while is the wrong choice here, because

When there are thousands of hits, single line results are a little meaningless. What I would like to do is have grep be aware of the start and stop points of source code lines, something as simple as having it consider ";" as the line separator would be a good start.

Bonus points if you can make it return the entire comment block if the hit is inside a comment.

I know you can't do this with grep alone. I also am aware of the option to have grep return a certain number of lines of context. Any suggestions on how to accomplish under Linux? FYI my preferred languages are C and Perl.

I'm sure I could write something, but I know that somebody must have already done this.

Thanks!

+1  A: 

You can write a command line using grep with the options that give you the line number and the filename, then xarg these results into awk to parse these columns and then use a little script from you to display the N lines surrounding that line? :)

Francisco Soto
+2  A: 

You can use pcregrep with the -M option (multiline matching; pcregrep is grep with Perl-compatible regular expressions). Something like:

pcregrep -M ";*\R*.*thingtosearchfor*\R*.*;.*"
wsh
Cool, somehow never knew of pcregrep. Love the usage hint: `Usage: pcregrep [-ABCcDdeFfHhilLMNnoqrsuVvwx] [long options] [pattern] [files]`. Always good to know what characters are valid options!
Jefromi
...*yeah*, option bloat, what fun.
wsh
@wash - WHAT option bloat??? They still have like 20 characters left un-used!
DVK
@wsh: Nothing wrong with having that many options - just look at normal grep! I just thought it was funny to see them all presented there as if it'd help remind you which ones were available. "Hm, how to suppress messages about unreadable files... definitely one of these twenty letters... oh, it must be s!"
Jefromi
Not perfect but definitely helpful. Thanks!
NXT
You're quite welcome. :)And Jefromi, that's sort of what I was getting at.
wsh
+1  A: 

If this isn't an academic endeavour you could just use cscope (for C code only though). If you are willing to drop the requirement to search in comments ctags should be enough (and it also supports Perl).

honk
+3  A: 

Here's an example using awk.

$ cat file
blah1
blah2
  function1 ("test",
                    MY_CONSTANT,
                    (some *really) - long / expression);

function2( one , two )
blah3
blah4

$ awk -vRS=")" '/function1/{gsub(".*function1","function1");print $0RT}' file
function1 ("test",
                    MY_CONSTANT,
                    (some *really)

the concept behind: RS is record separator. by setting it to ")", then every record in your file is separated by ")" instead of newline. This make it easy to find your "function1" since you can then "grep" for it. If you don't use awk, the same concept can be applied using "splitting" on ")".

ghostdog74