views:

1932

answers:

11

This is really two questions, but they are so similar, and to keep it simple, I figured I'd just roll them together:

  • Firstly: Given an established Perl project, what are some decent ways to speed it up beyond just plain in-code optimization?

  • Secondly: When writing a program from scratch in Perl, what are some good ways to greatly improve performance?

For the first question, imagine you are handed a decently written project and you need to improve performance, but you can't seem to get much of a gain through refactoring/optimization. What would you do to speed it up in this case short of rewriting it in something like C?

Please stay away from general optimization techniques unless they are Perl specific.

I asked this about Python earlier, and I figured it might be good to do it for other languages (I'm especially curious if there are corollaries to psycho and pyrex for Perl).

+69  A: 

Please remember the rules of Optimization Club:

  1. The first rule of Optimization Club is, you do not Optimize.
  2. The second rule of Optimization Club is, you do not Optimize without measuring.
  3. If your app is running faster than the underlying transport protocol, the optimization is over.
  4. One factor at a time.
  5. No marketroids, no marketroid schedules.
  6. Testing will go on as long as it has to.
  7. If this is your first night at Optimization Club, you have to write a test case.

So, assuming you actually have working code, run your program under Devel::NYTProf.

http://search.cpan.org/dist/Devel-NYTProf

Find the bottlenecks. Then come back here to tell us what they are.

If you don't have working code, get it working first. The single biggest optimization you will ever make is going from non-working to working.

Andy Lester
Best. Response. Ever.
pjf
Suggested edit: After "So, assuming..." add "If you don't have working code, get it working first. The single biggest optimization you will ever make is going from non-working to working."
Sherm Pendley
Done, thanks, Sherm.
Andy Lester
Rhetorical question: what is Perl specific for this optimization technique?
Peter Mortensen
It's not, of course. These rules apply to everything.
Andy Lester
+10  A: 

This only half relates to your question - but in the interest of documentation I'll post it here.

A recent CentOS/Perl bugfix increased the speed of our application more than two-fold. This is a must for anyone running CentOS Perl and use the bless/overload functions.

leek
Oh, good point.For more on the bug and a test program to see if it affects your code, see http://perlbuzz.com/2008/08/red-hats-patch-slows-down-overloading-in-perl.html
Andy Lester
+25  A: 

Andy has already mentioned Devel::NYTProf. It's awesome. Really, really awesome. Use it.

If for some reason you can't use Devel::NYTProf, then you can fall back to good old Devel::DProf, which has come standard with Perl for a long time now. If you have true functions (in the mathematical sense) which take a long time to calculate (eg, Fibonacci numbers), then you may find Memoize provides some speed improvement.

A lot of poor performance comes from inappropriate data structures and algorithms. A good course in computer science can help immensely here. If you have two ways of doing things, and would like to compare their performance, the Benchmark module can also prove useful.

The following Perl Tips may also prove useful here:

Disclaimer: I wrote some of the resources above, so I may be biased towards them.

pjf
+9  A: 

Without having to rewrite large chunks, you can use Inline::C to convert any single, slow subroutine to C. Or directly use XS. It's also possible to incrementally convert subs with XS. PPI/PPI::XS does this, for example.

But moving to another language is always a last resort. Maybe you should get an expert Perl programmer to look at your code? More likely than not, (s)he'd spot some peculiarity that's seriously hurting your performance. Other than that, profile your code. Remember, there is no silver bullet.

With regards to psyco and pyrex: No, there's no equivalent for Perl.

tsee
+17  A: 
brian d foy
+4  A: 

Profile your application - using for example, the profiler mentioned above. You will then see where the time's going

If the time is being spent doing things other than CPU usage, you need to reduce those first - CPU is easy to scale, other things aren't.

A few operations are particularly slow, I have found:

  • keys() on a large hash is very bad
  • Use of Data::Dumper for debug. Using this function on a large structure is very slow. Avoid it if you can. We've seen code which does:

    use Data::Dumper; $debugstr = Dumper(\%bighash); if ($debugflag_mostlyoff) { log($debugstr); }

  • Most modules have alternatives with different performance characteristics - some literally suck incredibly badly.

  • Some regular expressions can be very slow (lots of .* etc) and can be replaced by equivalent ones that are faster. Regular expressions are quite easy to unit test and performance test (just write a program which runs it in a loop against a big simulated data set). The best regular expressions start with something that can be tested very quickly, such as a literal string. Sometimes it's better not to look for the thing you're looking for first first, and do a "look behind" to check whether it really is the thing you're looking for. Optimising regexps really is a bit of a black art at which I'm not very good.

Don't consider rewriting something in C except as a last resort. Calling C from Perl (or vice versa) has a relatively large overhead. If you can get a quick Perl implementation, that's better.

If you do rewrite something in C, try to do it in a way which minimises the calling overhead, and calls to the perl runtime (The SV* functions for example mostly copy strings around). One way of achieving this is by making a C function which does more and calling it fewer times. Copying strings around in memory is not cool.

On the other hand, rewriting something in C carries big risk because you can introduce new failure modes, e.g. memory leaks, crashes, security problems.

MarkR
+6  A: 

An essay well worth reading on the subject is Nicholas Clark's talk When perl is not quite fast enough. Some of the points are slightly dated, such as the reference to Devel::DProf, but keep in mind that it was written in 2002.

Nevertheless, much of the material covered remains relevant.

dland
+4  A: 

Method and subroutine calls aren't free in Perl. They're relatively expensive. So, if your profiling turns out that you're spending a reasonably large chunk of the run time in small accessor methods, that might be a micro-optimization worth looking at.

However, what you should not do is replacing accessors such as get_color() here:

package Car;
# sub new {...}

sub get_color {
   my $self = shift;
   return $self->{color};
}

package main;
#...
my $color = $car->get_color();

with encapsulation-breaking direct accesses:

my $color = $car->{color};

One would think that this goes without saying, but one also sees this done all over the place. Here's what you can do, using Class::XSAccessor

package Car;
# sub new {...}
use Class::XSAccessor
  getters => {
    get_color => 'color',
  },
  setters => {
    set_color => 'color',
  };

This creates new methods get- and set_color() which are implemented in XS and thus about twice as fast as your hand-rolled version. Mutators (i.e. "$car->color('red')") are also available, as are chained methods.

Depending on your application, this may give you a very tiny (but essentially free) boost. Do not expect more than 1-2% unless you're doing something peculiar.

tsee
+1  A: 

If your code needs speeding up then chances are that your test suite does too. This talk touches on the key points:

Turbo Charged Test Suites

EvdB
+1  A: 

The most cost effective method might be, to consider faster hardware (=> appropriate hardware architecture). I am not talking faster CPUs, but rather faster disks, faster networking .. faster anything, really, that speeds up I/O.

I experienced this many years ago, when we moved a XML-parsing based application (bleeding edge technology at that time<g>) from a (fast and reliable!) Windows Server to a dedicated, albeit somewhat outdated, SUN platform with faster I/O all around.

As always, consider

  • developer performance (how long does it take to code, how complex is the problem, is the result maintainable),
  • Hardware performance,
  • Software performance

and improve where most (cost!) effective for the problem at hand...

I like this idea, but it should be said that sometimes spending a little more initial on developer time may lead to far less in the long run in terms of hardware cost/speedup. It really depends on how long term the project is intended to run.
Ape-inago
+4  A: 

The best way to make your program run faster is to make your program do less work. Pick the right algorithm for the job. I've seen lots of slow applications because they pick a dumb algorithm in some area of the code that gets called millions of times. When you are doing a million * a million operations instead of just a million operations, your program is going to run a million times slower. Literally.

For example, here's some code I saw that inserts an element into a sorted list:

while(my $new_item = <>){
    push @list, $new_item;
    @list = sort @list;
    ... use sorted list
}

sort is O(n log n). An insertion into a sorted list is O(log n).

Fix the algorithm.

jrockway