views:

137

answers:

5

G'day,

We have a Perl script that is processing geolocation requests from the head-end servers in a major web site. The script is a broker providing additional business logic interpreting the data returned by a COTS product that provides data for a given IP address, e.g. country, connection type, routing type, carrier, etc.

This Geo service is currently handling peak loads of around 1,000 requests per second at the COTS backend. BTW It is actually serving 5,000 requests p.s. from its dedicated loadbalance/cache layer that lives directly before the broker layer.

I have recently had to modify the behaviour of this broker to allow for a new category of connections that we've been seeing occur on the site which is causing some problems.

The original version of the script, not my design! btw, has been built using a mixture of config items in the script itself and other items in separate Perl fragments. As was quite rightly pointed out during the peer review of my changes, we should probably migrate all of the config items out to be separate rather than continue with a mixture of embedded and separate config items.

Now I want to take this further and put all config items, created as separate Perl hashes, into a single config file.

At the moment, we have to stop and restart the whole application to get the new config items reloaded which, given the traffic levels, is a bit inconvenient even though there are four instances of the broker across two separate data centres so we never actually lose the service.

I suspect that I am going to have to resort to keeping a timer, or maybe a request counter, and performing a stat on the config file in question. Or maybe even have a configured TTL for the config file and just reload it every ten minutes or so.

But is there a way to make Perl automatically reload a newer version of a file that it has previously loaded? I'm thinking of behaviour like that provided by the Apache mod_perl module.

cheers,

+4  A: 

Rob, a couple of points:

1) Preferably, abstract the config reader into an API as opposed to direct read from a Perl hash. This way, any call to that API can in turn decide what needs to be done with the config (e.g. is the timer up? did the config file timestamp change?).

As always, this has the added benefit of allowing you to re-design the config later on (perl has => xml => Database) without changing any of the software.

2) Seeing as it is a server, I'd also recommend an on-demand config reload functionality via a special request type. This allows you to force the config reload (e.g. after you update config files) by sending a command to the server instead of bouncing it.

BTW, #2 is very easy to do if you follow #1, since all that the "reload config" handler needs to do is to reset "config needs to be reloaded on the next config API call" flag.

3) If you insist on having the config as a hash with no API (e.g. for performance reasons to eliminate API subroutine calls, which is plausible but is unlikely to help much), then you need to place the config into a static variable in your class, and have that class provide "set new config" method. Then the server would set a timer, and on a timer call (or upon receiving "reload config" command from #2) would check whether the timestamp and/or the check-sum of the config file is different from the last time you called and reload.

DVK
P.S. I guess someone may suggest using tied hashes, but I have concerns over performance of a hash tied to a file directly given your scale.
DVK
A: 

@DVK, ooh, good call with your point 1) I hadn't thought of adding an extra layer of abstraction above the config. I guess the advantage of leaving it as raw Perl hashes is that there is no conversion layer required. I'll have a think about the tradeoffs involved.

As to point 2) we have found some flakiness with signal handling in Perl that made it not HUP well so we have officially deprecated Perl signal HUPs on a site-wide basis.

But I really like your idea for a special lookup command that would force a reload! I'll use that if you don't mind. Maybe use "GET 127.0.0.1" as that's probably not going to come through from outside! N.B. Our latest version of our geo protocol is http based so we can easily interrogate the service from a browser.

Thanks! \o/

Rob Wells
If you're implementing HTTP yourself, then consider using a completely new method instead of a "magic" GET. i.e. `RESTART / HTTP/1.1` and, of course, make sure to authenticate it somehow...
hobbs
@hobbs, thanks for the tip. I'll might do that as we do a parse of the incoming request. These geo rq's are locked down by very tight ACLs so that only rq's from specified hosts are handled.Only problem I see with this is in the future when we are thinking of replacing the Perl broker with a dedicated module running under Apache.
Rob Wells
#2, I meant a special command (usually a brand new command but some voodoo value of existing command is of course feasible if necessary). Definitely didn't mean system signal ala Apache.
DVK
+1  A: 

The traditional approach to this sort of problem on Unix-type machines is for the server program to reload its configuration on receipt of a signal. For example, the Apache documentation indicates that three signals have special meaning to the server: TERM tells the server to shut down, HUP forces an immediate restart and USR1 requests a graceful reloading of the configuration file. This sort of functionality could relatively straightforwardly be built into your program, provided you are working in an environment which supports signals.

Tim
@Tim, we have so much flakiness with Perl signal handling that we have pretty much abandoned using it site-wide.
Rob Wells
BTW Added the Solaris 10 tag for clarity.
Rob Wells
+1  A: 

If you're on a fairly recent version of Linux, there is always the inotify route. This means you can reload the config as soon as it is written to disk. Check out Linux-Inotify. There is also FAM for other platforms.

Adam Flott
@aflott, thanks but we're on Solaris.
Rob Wells
use File::ChangeNotify.
jrockway
+1  A: 

There always the option of moving the configs into a database and using DBI plus database triggers to make it event driven versus polled.

@Howard, thanks for the suggestion but I'm now thinking of making the business logic a separately loaded module as well like they do with the lbnamed load balancing name server.
Rob Wells