tags:

views:

641

answers:

7

I want to search for a line in a file, using regex, inside a Perl script.

Assuming it is in a system with grep installed, is it better to:

  • call the external grep through an open() command
  • open() the file directly and use a while loop and an if ($line =~ m/regex/)?
+8  A: 

In a modern Perl implementation, the regexp code should be just as fast as in grep, but if you're concerned about performance, why don't you simply try it out? From a code cleanliness and robustness standpoint, calling an external command line tool is definitely not good.

Michael Borgwardt
+3  A: 

It depends.

  • working inside Perl saves you the process startup time, and other related resource costs.
  • grep is probably faster than doing the same job in Perl, but not hugely so.

I'd say to do it in Perl unless performance forces you to optimize.

Darron
+5  A: 

One thing to be careful of with grep: In recent Linux distributions, if your LANG environment variable defines a UTF-8 type (e.g. mine is LANG=en_GB.UTF-8) then grep, sed, sort and probably a bunch of other text-processing utilities run about 10 times more slowly. So watch out for that if you are doing performance comparisons. I alias my grep command now to:

LANG= LANGUAGE= /bin/grep

Edit: Actually, it's more like 100 times more slowly

Adrian Pronk
+3  A: 

You don't need to open the file explicitly.

my $regex = qr/blah/;
while (<>) {
  if (/$regex/) {
    print;
    exit;
  }
}
print "Not found\n";

Since you seem concerned about performance, I let the match and print use the default $_ provided by not assigning <> to anything, which is marginally faster. In normal production code,

while (my $line = <>) {
  if ($line =~ /$regex/) {
    print $line;
    exit;
  }
}

would be preferred.

Edit: This assumes that the file to check is given on the command line, which I just noticed you had not stated applies in your case.

Dave Sherohman
It is marginally faster, but using the very global $_ is not worth the 10% (measured searching for /zebra/ in a 240k line /usr/dict/words). If you allocate $line outside the while loop the performance hit goes away.
Schwern
You can always set @ARGV manually and it will still work.
Chas. Owens
+2  A: 

It depends. If you want to optimize for development time,

$line = `grep '$regex' file | head -n 1`;

is clearly the thing to do.

But it comes at the cost of having to start external processes, depending on things besides perl being installed, and losing the opportunity to do detailed error reporting when something goes wrong.

ysth
A: 

I once did a script to search some regular expressions across some big text files (about 10 MB each one). I did it with Perl regexes, and noticed it was going quite slow. So I tried running grep from the script, and the speed boost was quit considerable. So, in my own experience, Perl built-in regexes are slower than grep. But you'll probably only notice it with large files. My advice is: try it both ways, and see how it goes.

Marc
+1  A: 
sed '/pattern/q' file