ansaurus

Question

Answer 1

+2 A:

Why couldn't you simply run the program and pipe the results to your perl script?

./program $arg $arg1 | myscript

Actually, you could probably get rid of the perl entirely:

./program $arg $arg1 | grep /\d...whatever.../ | sort

chris 2010-09-14 18:10:39

Pipe to grep is how I'd do it too. On Unix. Maybe he's on Windows and all he has installed is Perl and he doesn't want to install Cygwin.

Zan Lynx 2010-09-14 18:47:56

The other problem is that this is obviously a snippet of a larger program (see the @data array). I really wanted to write a bash example that does that loop, but who knows what he has going on in perl.

Mike Axiak 2010-09-14 18:51:21

@Mike: Well, the first option would take care of the larger program problem. Guess we should have asked "how slow". :)

chris 2010-09-14 22:44:35

Even if its part of a larger program, my @lines = `program $arg $arg1 | grep ...` would still cut out the middle man of having to write out and read a file back in.

Schwern 2010-09-14 23:56:24

`grep` from http://gnuwin32.sf.net/packages.html is an alternative to Cygwin.

daxim 2010-09-15 09:05:55

Answer 2

+5 A:

There are a few things I can see (for instance not loading your result into the file immediately), but I suspect the main performance benefit you will get will probably be from using a different regex. To that end, do you have a better idea what the data output format from your program is?

Here's some sample perl that may run a little bit quicker:

use strict;
foreach my $arg (@data){
  my @score=();
  open(my $fh, "program $arg $arg1 |");
  while (<$fh>) {
    chomp;
    if (/\d+.+\s+((\d+)?\.?\d+)/o) {
      push(@score, $1);
    }
  }
  close($fh);
  my @sorted = sort { $a <=> $b } @score;
}

Notice a few things here:

I'm using a program file handler so that I'm not using a temporary file, thus skipping a whole pass of data.
I changed the regex to use nested groups rather than multiple options.
I use strict and keep package names (for the love of God use strict in your perl).

The other people have said to use threads. You DO NOT need to do this, as running the process as I have done with the trailing pipe (|) in the open function causes perl to fork a process for you. Then you use standard unix pipes to read from the program asynchronously.

Mike Axiak 2010-09-14 18:12:09

I think you're not understanding the thread recommendation. If he turns the `foreach my $arg (@data)` loop into a bunch of threads, he can run `program` two or more times in parallel, thus potentially speeding up his program that way. Putting a pipe in the open function doesn't do this. (As far as I know, and it would be incredible to have that happen.)

CanSpice 2010-09-14 21:32:30

Ah I did misunderstand that, thanks :-)

Mike Axiak 2010-09-14 21:46:37

Answer 3

A:

Yup, first of all: redirecting program output to file, and reading it afterwards is stupid & expensive. Why not just?

my @result = `program $arg $arg1`;
foreach(@result) {...

Second thing is you can parallelize the outer foreach. perldoc threads, threads::shared.

hlynur 2010-09-14 18:26:57

-1 because he says the result is a very large file. Reading it into a Perl list will likely overflow his RAM.

Zan Lynx 2010-09-14 18:48:45

He says $arg1 is very large file. He didn't say *program* output is very large.

hlynur 2010-09-14 18:52:50

This is why perl allows you to use pipe in open(): http://perldoc.perl.org/perlipc.html#Using-open()-for-IPC

Mike Axiak 2010-09-14 19:11:26

That program you provided doesn't even work the way you think it does. Please delete this answer.

Brad Gilbert 2010-09-16 00:41:54

@Brad Gilbert nnaah... I'll leave it as it is.

hlynur 2010-09-16 05:34:03

`my @result =` **`split /\n/`** `\`program $arg $arg1\`;`

Brad Gilbert 2010-09-16 13:44:26

Sure, if you mind the trailing newlines.

hlynur 2010-09-16 19:47:21

Answer 4

+2 A:

Have you profiled your program? Without profiling, you don't know if the vast majority of the time is spent in the external program or in your program.

Profiling is an important step in optimization, and without it, you're essentially guessing where speed improvements can be made. Profiling will show you which steps are taking the most amount of time.

That said, as hlynur said, you could probably parallelize your external program calls using threads. You might also gain some optimizations through a different regular expression, but there's no real way to tell how much you'll gain without profiling first.

CanSpice 2010-09-14 18:37:51

ansaurus

tags:

views:

answers:

Optimize Perl external command

related questions