views:

352

answers:

4

I have this python cgi script that checks if it hasn't been accessed to many times from the same IP, and if everything is ok, reads a big file form disk (11MB) and then returns it as a download.

It works,but performance sucks. The bottleneck seems to be reading this huge file over and over:

def download_demo():
    """
    Returns the demo file
    """

    file = open(FILENAME, 'r')
    buff = file.read()

    print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n%s" %    (os.path.split(FILENAME)[-1], len(buff), buff)

How can I make this faster? I thought of using a ram disk to keep the file, but there must be some better solution. Would using mod_wsgi instead of a cgi script help? Would I be able to keep the big file in apache's memory space?

Any help is greatly appreciated.

+1  A: 

Try reading and outputting (i.e. buffering) a chunk of say 16KB at a time. Probably Python is doing something slow behind the scenes and manually buffering may be faster.

You shouldn't have to use e.g. a ramdisk - the OS disk cache ought to cache the file contents for you.

Andrew Medico
+1  A: 

mod_wsgi or FastCGI would help in the sense that you don't need to reload the Python interpreter every time your script is run. However, they'd do little to improve the performance of reading the file (if that's what's really your bottleneck). I'd advise you to use something along the lines of memcached instead.

oggy
+1  A: 

Why are you printing is all in one print statement? Python has to generate several temporary strings to handle the content headers and because of that last %s, it has to hold the entire contents of the file in two different string vars. This should be better.

print "Content-Type:application/x-download\nContent-Disposition:attachment;filename=%s\nContent-Length:%s\n\n" %    (os.path.split(FILENAME)[-1], len(buff))
print buff

You might also consider reading the file using the raw IO module so Python doesn't create temp buffers that you aren't using.

jmucchiello
+7  A: 

Use mod_wsgi and use something akin to:

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)

    file = open('/usr/share/dict/words', 'rb')
    return environ['wsgi.file_wrapper'](file)

In other words, use wsgi.file_wrapper extension of WSGI standard to allow Apache/mod_wsgi to perform optimised reply of file contents using sendfile/mmap. In other words, avoids your application even needing to read file into memory.

Graham Dumpleton