views:

37

answers:

3

My SQL query looks like this...

sekect * FROM mail_headers a LEFT JOIN mail_headers_body b ON a.mailid=b.id blah blah blah

I can select the lines before and after the result, using -A, -B switches. But how do I find the 4 words before and after the selected text?

grep -i 'JOIN mail_headers_body' mysql-gen.log

This will return the query mentioned above. But what I want to know is 4 "words" before the selected text. i.e.

FROM mail_headers a LEFT

And 4 (or 5) words after...

b ON a.mailid=b.id blah
A: 

you can use awk,

$ echo $sql|awk -F"JOIN mail_headers_body" '{ 
    m=split($1,a," ");n=split($2,b," ")
    print a[m-3],a[m-2],a[m-1],a[m]; print b[1],b[2],b[3],b[4] }'

this is assuming you have only 1 "JOIN mail_headers_body". Otherwise, use a for loop to iterate the fields.

ghostdog74
# If I change it to following, it prints $mytable as is, and not the actual table name.# cat mysql-gen.log | awk -vRS="$mytable" 'NR==1{for(i=NF-3;i<=NF;i++) print "$mytable " $i}'# I am trying to use this statement in a shell script.
shantanuo
because you are not passing the shell variable to awk correctly. use -v. `awk -vRS="$mytable" -v mytable="$mytable" '{.... print mytable" "$i}'`
ghostdog74
Does it return all the instances? in my case, it is returning only the first line it finds.
shantanuo
then, change the approach. Use FS instead. see my edit
ghostdog74
+2  A: 

Try this:

grep -Eio '( *[^ ]* *){4}JOIN mail_headers_body( *[^ ]* *){4}'

It should give you this output:

 FROM mail_headers a LEFT JOIN mail_headers_body b ON a.mailid=b.id blah
Dennis Williamson
I removed the -i switch because my table names are case sensitive. I tried to added the -w switch because I want to select only the specific word. # why does the table mail_headers_body_sent appear? I am looking for only the table mentioned above.
shantanuo
added the word boundary like # \<mail_headers_body\>
shantanuo
Very clean, very nice. Well done! +1
cdburgess
A: 

Here is the AWK, GREP, and CAT combinations in a one-liner:

cat YOURFILE | grep -i 'JOIN mail_headers_body' | awk '{ for(i;i<=NF;i++){ if($i ~ /JOIN/) { a = i } } for(x=4;x!=0;x--){ before=before" "$(a-x)} a+=3; for(x=0;x<4;x++) { after=after" "$(a+x) } print "BEFORE: "before" \nAFTER: "after  }'

update: I just removed a--; from the awk. I just realized it was skipping the first word before JOIN which is LEFT.

cdburgess
More than 4 words are returned in before and after
shantanuo
Based on the query you show in your post, I get the following back: BEFORE: * FROM mail_headers a AFTER: b ON a.mailid=b.id blahThat is 4 words. A word is defined as a group of characters with a space on both sides. So a.mailid=b.id is one word.
cdburgess
After update (removing the a--;) it now returns the following. BEFORE: FROM mail_headers a LEFT AFTER: ON a.mailid=b.id blah blah
cdburgess
UUOC, and `grep -i` is equivalent to `IGNORECASE=1` in awk. Essentially, just 1 awk command will do.
ghostdog74
However, the requirement is to search for 'JOIN mail_headers_body'. I do that only in the grep. The awk statement is taking the line and searching for the field number for the word JOIN and basing all field references in both directions from there. So it won't in my particular case.
cdburgess