ansaurus

Question

grep vs Perl. Should I mod this to work, or should I just replace it with Perl code?

Answer 1

+3 A:

Definitely just use Perl. Keeping all the work in a single process will speed things up, and invoking a subcommand with user specified args is just begging for exploits. If you do decide to stick with the subcommand quotemeta might be worth looking at.

Ry4an 2009-07-05 04:33:33

Nice function, but the problem is that the search should also support regular expressions, therefore with quotemeta it turns * to /* which then doesn't match how it should at all.

BHare 2009-07-05 04:43:10

Answer 2

+5 A:

I'll agree with Ry4an that going with Perl is probably sensible.

If you want to fix up your code, you need to (should) use single quotes around the arguments that you don't want the shell messing with - not double quotes. As a for instance, either backticks or '$(...)' notation will execute commands inside double quotes, and '${variable}' and such like will be expanded. Dealing with all the notations is too much like hard work; using single quotes is much simpler. Note that embedded single quotes need to be replaced by the sequence "'\''" (that was fun!); the double quotes simply surround the four characters that you replace a single quote with (assuming the string as a whole is embedded in single quotes). The first quote closes the current quoted string; the backslash, single quote means a literal single quote; the last single quote starts a new single quoted string. You don't need to escape any other characters.

Taking your code:

my $search = $in{SEARCH};
$search =~ s/'/'\\''/g;

open(COMMAND, "grep '$search' /home/$palace/palace/logs/$logfile | tail -n $NumLines |");
$f = <COMMAND>;
if  ($f)
{
    print $Title;
    print "<div id=log>\n";
    do { print $f."<br>";} while ($f = <COMMAND>);
    print "</div>\n";
}
else
{
    print $Error;
}
close(COMMAND);

Jonathan Leffler 2009-07-05 05:01:50

This is the solution I end up using. Single quotes was way easier than escaping the right characters. If you are looking for a solution that is strickly PERL then check out Hynek -Pichi- Vychodil's post. Hynek -Pichi- Vychodil had the fastest PERL code, but PERL's defualt regex system is slower than grep's. This is explained in blackkettle's comment with the website: swtch.com/~rsc/regexp/regexp1.html

BHare 2009-07-07 12:43:11

Answer 3

+1 A:

I did a quick perl writeup of how it would work, im not sure if this is optimized or the best way. But here it is:

my $num_matches = 0;
my $logdir = "/home/$palace/palace/logs/$logfile";
open my $fh, "<", "$logdir" or die "Can't open [$logdir]: $!";  


print $Title;
print "<div id=log>\n";


@lines = reverse <$fh>;
foreach $line (@lines) {
    #print "<b>$line</b> - $num_matches<br>";
    if ($line =~ m/\Q$in{'SEARCH'}\E/){
     print $line."<br>";
     $num_matches ++;
    }
    if ($num_matches >= $NumLines) {
     last;
    }
}

print "</div>\n";

I do want to note that I ran benchmarks (http://perldoc.perl.org/Benchmark.html) on the PERL way (above) vs my grep way (original question) these are the follow results:

PERL WAY: the code took: 4 wallclock secs ( 2.89 usr + 0.96 sys = 3.85 CPU)

GREP WAY: the code took: 1 wallclock secs ( 0.02 usr 0.04 sys + 0.31 cusr 0.14 csys = 0.51 CPU)

So i need to optimize my Perl code. Any ideas? I thought by reading the file in reverse, that it could find the matches ASAP and wouldnt need to read any further, saving it from reading the whole file to memory. However, I am guessing that the reverse command reads the file in memory anyways. Perhaps I should use system(tac $file) and read it, but again I am trying to avoid external command calls.

BHare 2009-07-05 05:14:56

Don't read the entire file into an array to read backwards, use File::ReadBackwards instead.

ysth 2009-07-05 05:19:29

I require only libraries that are included by default.

BHare 2009-07-05 05:37:00

Update: I have came to the conclusion that not expanding my code to allow some CPAN modules is silly. The reason I originally decided not to is because I am writing a panel in webmin, and I would liked it to install like a normal webmin module, which to my understanding means the end-user would have to have to manually install the required CPAN modules. Instead I have decided to have a directory called cpan, and I downloaded the loan .pm file there, and then referenced to it.

BHare 2009-07-05 05:54:29

Even with using File::ReadBackwards, it takes 16 seconds for PERL to read a match thats deep in the file, where as the grep | tail combo takes 0.1 seconds. Grrrrr

BHare 2009-07-05 06:01:56

it possible that the discrepancy in terms of time has something to do with the regexp you are using. The grep regex engine employs the Thompson Algorithm for regex compilation and search, this is roughly linear in the input. However the Perl ( and Ruby, PHP, Python, etc. ) implementations employ an approach ( for different reasons ) which requires exponential time to process for certain inputs, http://swtch.com/~rsc/regexp/regexp1.html

blackkettle 2009-07-05 07:31:19

Look at mine code (http://stackoverflow.com/questions/1083453/grep-vs-perl-should-i-mod-this-to-work-or-should-i-just-replace-it-with-perl-co/1083721#1083721) and try compare with your grep approach.

Hynek -Pichi- Vychodil 2009-07-05 09:07:03

Try modifying your foreach statement for while, and remove the reverse statement. What it does is basically a) load the entire file in memory; and b) reverse it.What you need to use `while` instead.

Igor 2009-07-05 13:17:35

Which version of Perl?

Brad Gilbert 2009-07-05 13:50:10

Where are these defined ($palace,$logfile,$Title,%in)?

Brad Gilbert 2009-07-05 13:58:07

`grep` doesn't read the whole file first, so why is your perl code doing that?

Brad Gilbert 2009-07-05 14:02:55

$palace is basically their home dir's name.$logfile is the file being read, in this case a huge log file$Title is just some HTML that brings a <h1> thingand %in has variables from the POST like the search string itself, etc

BHare 2009-07-05 21:51:10

Answer 4

+2 A:

You can bypass shell-quoting issues completely by specifying the arguments separately:

open my $grepfh, "-|", "grep", "-F", $in{'SEARCH'}, "/home/$palace/palace/logs/$logfile" or die "grep open error: $! $?\n";

Including the tail, without any error checking:

open( my $resultfh, '-|' )
or do {
    pipe(my $rdfh, my $wrfh);
    if (fork) {    
        open( STDOUT, '>&', $wrfh );
        exec( "grep", @grepargs );
    }
    else {
        open( STDIN, '<&', $rdfh );
        exec( "tail", @tailargs );
    }
};
while (my $line = readline($resultfh)) {
    ....
}

ysth 2009-07-05 05:18:12

This did not work for me.a) there is no tail commandb) your using -F, which is fast grep. I need regular expressions.c) never specifies what file to grep

BHare 2009-07-05 05:28:44

a) yes, you'd have to do the tail separately b) I did -F based on your answer where you matched with /Q.../E; if you don't want -F, you don't want those in your perl match c) oops,added

ysth 2009-07-05 09:43:02

Answer 5

+2 A:

Try this approach:

my $re = qr/\Q$in{'SEARCH'}\E/;
my @lines;
while (<$fh>) {
    next unless m/$re/;
    push @lines, $_;
    shift @lines if @lines > $NumLines;
}

print @lines;

Hynek -Pichi- Vychodil 2009-07-05 09:04:12

This is what I was talking about above. while (<$fh>) runs on scalar context, meaning it asks $fh one line at each time instead of using reverse, which will load the entire file into memory.

Igor 2009-07-05 13:19:17

Answer 6

+1 A:

In terms of performance, Perl is excelent. It may not be as fast as the grep process, but I think that the difference may not be noticeable by the end user.

Since one of your goals is control over the user input I sugest you go with Perl, beacuse it can give you as much control over the user input as you may need now or in the future.

jpbaudoin 2009-07-05 13:09:04

Answer 7

A:

Perl has a built in grep:

open(IN, $file) || die;
@result = grep {/\Q$in{'SEARCH'}\E/} <IN>;
close(IN);

# this is the equivalent of your tail
$end = $#result;
$begin = $end - $NumLines + 1;
$begin = 0 if $begin < 0;
print "@result[$begin..$end]";

xcramps 2009-07-05 15:09:44

This is not bad at all, but the linux grep command still blows it away in terms of performance.

BHare 2009-07-05 20:37:03

and the "shift" solution, how did it measure?

Massa 2009-07-05 20:56:45

I just checked the benchmarks and I was actually a bit wrong on this one, it was one of the slowest. Please see my update in the question for benchmark times. Shift solution was the fastest besides system grep

BHare 2009-07-05 21:29:08

Answer 8

A:

Silly idea, but how about using grep with single quote when there is no single quote in the line, or using perl when there is? This way you get fast performance and security at the same time.

Arkadiy 2009-07-06 02:24:11

My solution i have now using system grep, and I have it so it escapes the ', its a good solution. Now this thread is more about finding out why PERL isnt as fast as grep. But I think we found out based on the Thompson idea.

BHare 2009-07-06 04:37:46

ansaurus

tags:

views:

answers:

grep vs Perl. Should I mod this to work, or should I just replace it with Perl code?

related questions