views:

70

answers:

3

Hello,

I am using Apache/2.2.8 (Ubuntu) mod_python/3.3.1 Python/2.5.2 and I would like to preload the data I work with.

Currently I read the data from a file on disk every time I get a request, then parse it and store it in an object. The data file is relatively large and I would like to parse/preload it ahead of time.

I was thinking I could either 1) load the data in memory when apache starts (~100MB to 500MB of data would reside in memory while the server is running) or I could 2) load it when the first data request is submitted and keep it in memory until I shut the server down.

below is the mock up of the second idea:

from mod_python import apache
from mod_python import Session

gvar = 0

def handler(req):
    req.content_type = 'text/plain'

    session = Session.Session(req)
    if session.is_new():
        global gvar
        req.write('gvar was originally : '+str(gvar))
        gvar = 314
        session['addr'] = req.connection.remote_ip
        session.save()
        req.write('\ngvar was just set to: '+str(gvar))
    else:
        global gvar
        req.write('gvar set to: '+str(gvar))

    return apache.OK

output (session one):
gvar was originally : 0
gvar was just set to: 314

output (session > 1):
gvar set to: 314

Please share your comments and solutions, thx

A: 

You don't say what form your data is in, but if a keystore will suffice then you can use shelve along with OS caching in order to hold the data in a preparsed format.

Ignacio Vazquez-Abrams
my data is in text format
Dragan Chupacabrovic
I would like to avoid serialization, since it doesn't save me much time when I read the data from disk (there is a lot of meta info associated with the data obj). Could you also clarify what you mean by "along with OS caching"?
Dragan Chupacabrovic
shelve uses a keystore stored on disk, so it would be up to the OS to hold the relevant parts in the disk cache.
Ignacio Vazquez-Abrams
I'll remember this for the future, but I'd rather not do disk IO more than once for this proj since there is no guarantee that the obj will be cached, right?
Dragan Chupacabrovic
That is correct.
Ignacio Vazquez-Abrams
A: 

Another option is to use posix_ipc to hold the data in shared memory, available to all processes.

Ignacio Vazquez-Abrams
+1  A: 

You could set a tmpfs (or ramfs) mount with the data and it will stay in RAM (tmpfs may send data to swap).

KurzedMetal