I am working on a Django application which allows a user to upload files. I need to perform some server-side processing on these files before sending them on to Amazon S3. After reading the responses to this question and this blog post I decided that the best manner in which to handle this is to have my view handler invoke a method on Pyro remote object to perform the processing asynchronously and then immediately return an Http 200 to the client. I have this prototyped and it seems to work well, however, I would also like to store the state of the processing so that the client can poll the application to see if the file has been processed and uploaded to S3.
I can handle the polling easily enough, but I am not sure where the appropriate location is to store the process state. It needs to be writable by the Pyro process and readable by my polling view.
- I am hesitant to add columns to the database for data which should really only persist for 30 to 60 seconds.
- I have considered using Django's low-level cache API and using a file id as the key, however, I don't believe this is really what the cache framework is designed for and I'm not sure what unforeseen problems there might be with going this route.
- Lastly, I have considered storing state in the Pyro object doing the processing, but then it still seems like I would need to add a boolean "processing_complete" database column so that the view knows whether or not to query state from the Pyro object.
Of course, there are also some data integrity concerns with decoupling state from the database (what happens if the server goes down and all this data is in-memory?). I am to hear how more seasoned web application developers would handle this sort of stateful processing.