views:

282

answers:

6

I've been benchmarking the performance of a framework I'm writing in Perl and I'm getting a 50% decrease in requests per second over our existing codebase (some hit is understandable, because we're going from procedural spaghetti code to an OOP MVC framework).

The application is running under mod_perl, and I've added Moose and all my framework code into the startup.pl script, which itself doubled my requests per second amount. I'm looking to further enhance this number to get it as close as possible to the existing amount. The argument is there that this is premature optimisation, but there are a couple of glaring inefficiencies that I'd like to fix and see how it effects performance.

Like most frameworks, I have a configuration file and a dispatcher. The config part is handled by Config::General, so a bit of IO and parsing is involved to get my config file loaded into the app. The biggest problem I see here is that I'm doing this for EVERY REQUEST that comes in!

Running Devel::Dprof on my app points to Config::General::BEGIN and a bunch of related IO modules as one of the major slow points that isn't Moose. So what I'd like to do, and what makes a lot more sense in hindsight is take advantage of mod_perl's persistence and the startup.pl compilation stuff to only do the work to load in the config file once - when the server starts.

The problem is that I'm not too familiar with how this would work.

Currently each project has a PerlHandler bootstrapping class which is pretty lean and looks like this:

use MyApp; 
MyApp->new(config_file => '/path/to/site.config')->run();

MyApp.pm inherits from the framework Project module, which has this code:

my $config = Config::General->new(
                -ConfigFile => $self->config_file,
                -InterPolateVars => 1,
             );    

$self->config({$config->getall});

To only do this at compile time, both my bootstrap and Project base modules will have to change (I think), but I'm pretty unsure as to what changes to make and still keep the code nice and lean. Can anyone point me in the right direction here?

UPDATE

I tried the BEGIN BLOCK in each project module approach as described by ysth in his answer. So I now have:

package MyApp::bootstrap;
use MyApp;

my $config;
BEGIN
{
    $config = {Config::General->new(...)->getall};        
}

sub handler { ..etc.
    MyApp->new(config => $config)->run();

This quick change alone gave me a 50% increase in requests per second, confirming my thoughts that the config file was a major bottleneck worth fixing. The benchmark figure on our crotchety old dev machine is 60rps, and my framework has went from 30rps to 45rps with this change alone. For those who say Moose is slow and has a compile time hit.. I got the same (50%) increase when compiling all my Moose code at start-up as I did from pre-compiling my config file.

The only problem I have now is that this violates the DRY principal since the same Config::General->new code is in every BEGIN block with only the path to the config file differing. I have a few different strategies to limit this, but I just wanted to post the results of this change.

+7  A: 

Assuming your applications don't change the config at all, move it into a begin block:

# this code goes at file scope
my $config;
BEGIN {
    $config = { Config::General->new( ... )->getall }
}

# when creating a new instance
$self->config( $config );

And make sure all your modules are compiled in startup.pl.

You could get fancier, and have a singleton class provide the config hash, but you don't need to.

ysth
The problem with this solution is that you have to create this BEGIN block with that code for every project module(each project has its own config file). I did put this in quickly and I got a 50% increase in requests per second, so I'm upvoting the answer anyway, because it does answer my question
David McLaughlin
To get around the "each project needs it's own config" you could1) Combine all of the files into 1 with different sections (could use an INI file or or Config::ApacheFormat). 2) have a config class which holds each config file in a hash and pulls the right one based on some $ENV var.
mpeters
I never really considered having one huge config file because of how hard it would be to maintain when you have hundreds of projects... but there are actually a lot of benefits in doing something like since we share database connections across all the projects except in a handful of cases. Thanks.
David McLaughlin
A: 

I had the same problems in an HTML::Mason framework install, and found this to work rather well: In httpd.conf:

PerlRequire handler.pl
<FilesMatch "\.mhtml$">
  SetHandler perl-script
  PerlHandler YourModule::Mason
</FilesMatch>

In your handler.pl file, you define all of your static items like your config, database handles, etc. This defines them in the scope of YourModule::Mason which is compiled when the apache thread starts (new threads will obviously have an inherent overhead). YourModule::Mason then has a handler method which handles the request.

I will admit that there may be some magic that is happening in HTML::Mason that is helping me with this, but it works for me, maybe for you?

Jack M.
See my answer below. It's more than just having your mason objects instantiated, because by then you've loaded a big chunk of Mason, that does not have to be reloaded, and can be transparently shared among workers.
Len Jaffe
+2  A: 

Are u able make your Moose classes immutable? This might give u another speed bump.

draegtun
Of course. This is one of the first things you learnt to do when using Moose.
David McLaughlin
Yes but its useful for people to know who don't know! (in fact when I first tried Moose this wasn't even available or at least documented).
draegtun
A: 

JackM has the right idea.

By loading all of your classes and instantiating your Application-level objects (in your case, the configuration) in the "Mother" Apache process, You Don't have to compile them each time a new worker spawns, since they're already available and in memory. The very meticulous amongst us add a "use" line for every module that their application uses regularly. If you don't load your packages and modules in the mother ship, each worker takes not only the performance hit of loading the modules, but does not gain the benefit of memory sharing that modern operating systems provide.

It is really the other half of the difference between mod_perl and CGI. With the first half being mod_perl's persistent perl-engine vs CGI's respawning perl for each invocation.

Len Jaffe
Read the question? I mentioned startup.pl several times. http://perl.apache.org/docs/2.0/user/config/config.html#C_PerlPostConfigRequire_
David McLaughlin
A: 

A common way of speeding up such things with few changes is to simply use global variables and cache state in them between invocations of the same Apache process:

use vars qw ($config);
# ...
$config = Config::General->new( ... )->getall
    unless blessed($config); # add more suitable test here

It's not very clean and can lead to obscure bugs (although "my $var" leads to more in my experience) and it sometimes eats a lot of memory, but many (repeated) expensive initialization statements can be avoided this way. The advantage over using BEGIN{}; code only is that you can re-initialize based on other events as well without needing to restart apache or killing your process (e.g. by including the timestamp of a file on disk in the test above).

Watch out for the gotchas though: an easy way to break in

mjy
+1  A: 

A module's import sub is executed at compile time, so we could use that to reduce/eliminate the DRY of ysth's answer.

In the following example we use an import method to read the configuration file with the arguments given to us and then push that configuration into the calling package.

The caveat being any $config variable in the calling package is going to get wiped out by this.

package Foo_Config;
use English qw(-no_match_vars);
sub import {
   my ($self, @cfg) = @ARG;
   my $call_pkg     = caller;
   my $config       = {Config::General->new(@cfg)->getall};
   do{ # this will create the $config variable in the calling package.
       no strict 'refs';
       ${$call_pkg . '::config'} = $config;
   };
   return;
}

package MyApp;
# will execute Foo_Config->import('/path/to/site.config') at compile time.
use Foo_Config '/path/to/site.config';
Danny
+1, your solution addresses the OP's complaint after the update.
Adam Bellaire