views:

46

answers:

3

Task is: I have task queue stored in db. It grows. I need to solve tasks by python script then i have resources for it. I see two ways:

  1. python script working all the time. But i don't like it (reason posible memory leak).

  2. puthon script called by cron and do a little part of task. But i need to solve problem of one working active sript in memory (To prevent active scripts count grow). What is te best solution for implement it in python?

Any ideas to solve this problem at all?

+1  A: 

This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection. croning a Python script to handle the queue isn't very nice, although it would work fine.

I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.

katrielalex
You can leak ressources (forget to close files opened in global namespace etc), but memory doesn't really leak. Cyclic references aren't collected as soon, but they are collected, too. Unless there's a bug in it, of course.
delnan
True that. And opening files in the global namespace is just icky :p.
katrielalex
A: 

You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.

For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.

When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.

This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.

Nicholas Knight
+1  A: 

I'd suggest using Celery, an asynchronous task queuing system which I use myself.

It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.

MattH