ansaurus

Question

Optimizing grep (or using AWK) in a shell script

Answer 1

+2 A:

A sed solution:

sed 's/$.*$/\/\1\/p/' $sourcefile | sed -nf - $targetfile

This transforms every line of $sourcefile to a sed pattern match command:

matchstring

to

/matchstring/p

You'd need to escape special characters to make this robust, however.

Nathan Kidd 2010-05-12 17:25:43

Thanks! Giving this a try now. Already seems faster than using grep, albeit the source file has about 4000 lines and is searching against a 300 meg target file, so I expect it will still take a bit of time. Let's see what happens.

Ode 2010-05-12 19:30:26

Answer 2

+1 A:

Using awk to read in the sourcefile then searching in targetfile (untested):

nawk '
    NR == FNR {patterns[$0]++; next}
    /RID/ {
        for (pattern in patterns) {
            # since fgrep considers patterns as strings not regular expressions, 
            # use string lookup and not pattern matching ("~" operator).
            if (index($0, pattern) > 0) {
                print
                break
            }
        }
    }
' "$sourcefile" "$targetfile" > "$outputfile"

Will also with with gawk.

glenn jackman 2010-05-12 18:42:52

Thanks for the suggestion, will also give this a try.

Ode 2010-05-12 19:30:37

Was pretty fast but the fgrep -f as suggested was more what I needed.

Ode 2010-05-12 20:47:06

Answer 3

+5 A:

Am I missing something, or why not just fgrep -f "$sourcefile" "$targetfile"?

Arkku 2010-05-12 20:16:49

WOW! This was faster than the other two. The results seem proper, too. I mean, lightning lightning fast. Awesome!

Ode 2010-05-12 20:41:22

ansaurus

tags:

views:

answers:

Optimizing grep (or using AWK) in a shell script

related questions