tags:

views:

49

answers:

2

I wrote a Perl program which searches and manipulates one text file. This CGI process slurps the file directly into memory, manipulates the file based on the User's input, then generates the HTML result.

It works functional wise. However, I know that once I deploy on high volume server, it will not be able to respond in due time. I suspect the memory to be a bottleneck. What is the best way to share that file, such that it is read once into memory once the server starts and never again ?

The solution I am guessing is a server daemon which loads the file into memory and serves other processes/threads their data. If so, what is the best method to implement the IPC?

+2  A: 

That's pretty much why mod_perl was created.

msw
+5  A: 

Use FastCGI. It effectively turns your CGI program into a little server that your web server calls. Its simple and works on lots of different web servers. Because your CGI runs in its own process it can run on a different machine from your web server, and you can scale your program across multiple application servers. And it works with most major programming languages.

The advantage of mod_perl is that it lets you poke at the guts of Apache using Perl. If you're not using that its overkill. mod_perl has a lot of drawbacks. It ties you to Apache and whatever version of Perl was compiled into mod_perl. It mixes up the configuration and memory space of all your Perl applications with Apache's which complicates configuration and leads to memory bloat. And mod_perl is really complicated and difficult to upgrade.

Schwern
Also, mod_perl can have fairly "interesting" data persistence bugs.
Paul Nathan
That may be an implementation solution, however we are not at that stage yet. Regardless of what CGI module to use etc. In terms of coding, what is the best practice in terms of writing code to read the same file over and over ? I am looking for something along the lines of Shared Memory, mmap, IPC - using Sockets, etc.
Face
@Face You could write a server to hold the file in memory and hand it off to other processes, but if that's all it does you're essentially reproducing the operating system's disk cache. Badly. The best practice is *not to* read the file over and over again by keeping your process persistent. This also saves the cost of recompiling the code and redoing initialization each hit. This is what FastCGI buys you. If you want to hold some *calculated* data persistent in memory so other processes can share it, try something like memcached. If the data is complicated, look into NoSQL.
Schwern
@Face It may have not been clear, FastCGI is not just another CGI module like CGI.pm vs CGI::Application. It changes the way the web server loads and interacts with your CGI program. Instead of every hit compiling and running and then throwing away your process, FastCGI turns your CGI program into a little server which your web server talks to. Your CGI process remains persistent between hits avoiding recompilation and initialization and allowing it to cache data. This is the absolute first thing you must do to scale a CGI program, and it opens up additional scaling techniques.
Schwern
@Schwern, assuming the code below:#####open (FH, '<', $myfile) ;$Array=<FH> ;sub file_subsection {# Do some processing on $Array and return desired subsection}sub file_query {# Do some search, count number of occurrences, or query on $Array and return results}sub gen_HTML {# Generate the HTML response output}close(FH);#####Would the file only be opened once and copied into memory once? If so, how does FastCGI know what to persist and what to make volatile?
Face
@Face You stick your normal CGI code in a loop and it processes a request each time around. There's no magic, its just a loop. If you want something to persist, put it in a global or lexical outside the loop. See http://search.cpan.org/perldoc?CGI::Fast and http://search.cpan.org/perldoc?FCGI.
Schwern
Thanks, I looked into FastCGI, it looks like it would do exactly what I need.
Face