tags:

views:

122

answers:

3

Hi -

In my shell script, I am trying to search using terms found in a $sourcefile against the same $targetfile over and over.

My $sourcefile is formatted as such:

pattern1
pattern2
etc...

The inefficient loop I have to search with is:

for line in $(< $sourcefile);do
    fgrep $line $targetfile | fgrep "RID" >> $outputfile
done

I understand it would be possible to improve this by either loading the whole $targetfile into memory, or perhaps by using AWK?

Thanks

+2  A: 

A sed solution:

sed 's/\(.*\)/\/\1\/p/' $sourcefile | sed -nf - $targetfile

This transforms every line of $sourcefile to a sed pattern match command:

matchstring

to

/matchstring/p

You'd need to escape special characters to make this robust, however.

Nathan Kidd
Thanks! Giving this a try now. Already seems faster than using grep, albeit the source file has about 4000 lines and is searching against a 300 meg target file, so I expect it will still take a bit of time. Let's see what happens.
Ode
+1  A: 

Using awk to read in the sourcefile then searching in targetfile (untested):

nawk '
    NR == FNR {patterns[$0]++; next}
    /RID/ {
        for (pattern in patterns) {
            # since fgrep considers patterns as strings not regular expressions, 
            # use string lookup and not pattern matching ("~" operator).
            if (index($0, pattern) > 0) {
                print
                break
            }
        }
    }
' "$sourcefile" "$targetfile" > "$outputfile"

Will also with with gawk.

glenn jackman
Thanks for the suggestion, will also give this a try.
Ode
Was pretty fast but the fgrep -f as suggested was more what I needed.
Ode
+5  A: 

Am I missing something, or why not just fgrep -f "$sourcefile" "$targetfile"?

Arkku
WOW! This was faster than the other two. The results seem proper, too. I mean, lightning lightning fast. Awesome!
Ode