I wrote a python script to process some data from CSV files. The script takes between 3 to 30 minutes to complete, depending on the size of the CSV.
Now I want to put in a web interface to this, so I can upload the CSV data files from anywhere. I wrote a basic HTTP POST upload page and used Python's CGI module - but the script just times out after some time.
The script outputs HTTP headers at the start, and outputs bits of data after iterating over every line of the CSV. As an example, this print statement would trigger every 30 seconds or so.
# at the very top, with the 'import's
print "Content-type: text/html\n\n Processing ... <br />"
# the really long loop.
for currentRecord in csvRecords:
count = count + 1
print "On line " + str(count) + " <br />"
I assumed the browser would receive the headers, and wait since it keeps on receiving little bits of data. But what actually seems to happen is it doesn't receive any data at all, and Error 504
times out when given a CSV with lots of lines.
Perhaps there's some caching happening somewhere? From the logs,
[Wed Jan 20 16:59:09 2010] [error] [client ::1] Script timed out before returning headers: datacruncher.py, referer: http://localhost/index.htm
[Wed Jan 20 17:04:09 2010] [warn] [client ::1] Timeout waiting for output from CGI script /Library/WebServer/CGI-Executables/datacruncher.py, referer: http://localhost/index.htm
What's the best way to resolve this, or, is it not appropriate to run such scripts in a browser?
Edit: This is a script for my own use - I normally intend to use it on my computer, but I thought a web-based interface could come in handy while travelling, or for example from a phone. Also, there's really nothing to download - the script will most probably e-mail a report off at the end.