This shell script is used to extract a line of data from $2
if it contains the pattern $line
.
$line
is constructed using the regular expression [A-Z0-9.-]+@[A-Z0-9.-]+
(a simple email match), form the lines in file $1
.
#! /bin/sh
clear
for line in `cat "$1" | grep -i -o -E "[A-Z0-9.-]+@[A-Z0-9.-]+"`
do
echo `cat "$2" | grep -m 1 "\b$line\b"`
done
File $1
has short lines of data (< 100 chars) and contains approx. 50k lines (approx. 1-1.5 MB).
File $2
has slightly longer lines of text (> 80 to < 200) and has 2M+ lines (approx. 200MB).
The desktops this is running on has plenty of RAM (6 Gig) and Xenon processors with 2-4 cores.
Are there any quick fixes to increase performance as currently it takes 1-2 hours to completely run (and output to another file).
NB: I'm open to all suggestions but we're not in the position to complexity re-write the whole system etc. In addition the data come from a third party and is prone to random formatting.