views:

49

answers:

5

Hi all

I'm running a grep to find any *.sql file that has the word select followed by the word customerName followed by the word from. This select statement can span many lines and can contain tabs and newlines.

I've tried a few variations on the following:

$ grep -liIr --include="*.sql" --exclude-dir="\.svn*" --regexp="select[a-zA-Z0-
9+\n\r]*customerName[a-zA-Z0-9+\n\r]*from"

This, however, just runs forever. Can anyone help me with the correct syntax please?

+1  A: 

grep works line by line, you can't use grep on multiline.

man grep :

grep, egrep, fgrep, rgrep - print lines matching a pattern

Colin Hebert
Lesson learned :)
Ciaran Archer
+2  A: 

Isn't grep only for single line search? Maybe this question is useful for you: How can I search for a multiline pattern in a file ?

splash
+1 for cross-ref - the pcregrep command is interesting, though even it doesn't seem to look across line boundaries; standard GNU grep with -P and pcregrep with -M can find newlines when you explicitly search for them, but not when you don't include '`\n`' in the search string.
Jonathan Leffler
+2  A: 

Your fundamental problem is that grep works one line at a time - so it cannot find a SELECT statement spread across lines.

Your second problem is that the regex you are using doesn't deal with the complexity of what can appear between SELECT and FROM - in particular, it omits commas, full stops (periods) and blanks, but also quotes and anything that can be inside a quoted string.

I would likely go with a Perl-based solution, having Perl read 'paragraphs' at a time and applying a regex to that. The downside is having to deal with the recursive search - there are modules to do that, of course, including the core module File::Find.

In outline, for a single file:

$/ = "\n\n";    # Paragraphs

while (<>)
{
     if ($_ =~ m/SELECT.*customerName.*FROM/mi)
     {
         printf file name
         go to next file
     }
}

That needs to be wrapped into a sub that is then invoked by the methods of File::Find.

Jonathan Leffler
+1 for the suggestions and the thoughts on the other characters.
Ciaran Archer
A: 

Hi, I am not very good in grep. But your problem can be solved using AWK command. Just see

awk '/select/,/from/' *.sql

Above code will result from first occurence of "select" till first sequence of "from". Now you need to verify whether returned statements are having "customername" or not. For this you can pipe the result. And can use awk or grep again.

Amit
A: 

Aside from the regex, you may want to look into ack, which automatically excludes the .svn directories for you, and which can limit to only .sql files with --sql.

Andy Lester