views:

155

answers:

4

Dear pythoners,

I have a problem with using Twisted for simple concurrency in python. The problem is - I don't know how to do it and all online resources are about Twisted networking abilities. So I am turning to SO-gurus for some guidance.

Python 2.5 is used.

Simplified version of my problem runs as follows:

  1. A bunch of scientific data
  2. A function that munches on the data and creates output
  3. ??? < here enters concurrency, it takes chunks of data from 1 and feeds it to 2
  4. Output from 3 is joined and stored

My guess is that Twisted reactor can do the number three job. But how?

Thanks a lot for any help and suggestions.

upd1:

Simple example code. No idea how reactor deals with processes, so I have given it imaginary functions:

datum = 'abcdefg'

def dataServer(data):
    for char in data:
        yield chara

def dataWorker(chara):
    return ord(chara)

r = reactor()
NUMBER_OF_PROCESSES_AV = 4
serv = dataserver(datum)
id = 0
result = array(len(datum))

while r.working():
    if NUMBER_OF_PROCESSES_AV > 0:
        r.addTask(dataWorker(serv.next(), id)
        NUMBER_OF_PROCESSES_AV -= 1
        id += 1
    for pr, id in r.finishedProcesses():
        result[id] = pr
+2  A: 

To actually compute things concurrently, you'll probably need to employ multiple Python processes. A single Python process can interleave calculations, but it won't execute them in parallel (with a few exceptions).

Twisted is a good way to coordinate these multiple processes and collect their results. One library oriented towards solving this task is Ampoule. You can find more information about Ampoule on its Launchpad page: https://launchpad.net/ampoule.

Jean-Paul Calderone
Can you provide example code related to my problem? the does not seem to be any documentation.
Rince
The examples should get you started. I don't see them hosted anywhere on the web, but if you download the 0.2.0 release, you'll find them in the "examples" directory.
Jean-Paul Calderone
+2  A: 

Do you need Twisted at all?

From your description of the problem I'd say that multiprocessing would fit the bill. Create a number of Process objects that are given a reference to a single Queue instance. Get them to start their work and put their results on the Queue. Just use blocking get()s to read the results.

quamrana
Sadly my institution uses Python 2.5 and does not have any plans of going to Python 2.6 as for now. So no multiprocessing goodness.
Rince
Except that multiprocessing is available as a back port from 2.6
quamrana
+4  A: 

As Jean-Paul said, Twisted is great for coordinating multiple processes. However, unless you need to use Twisted, and simply need a distributed processing pool, there are possibly better suited tools out there.

One I can think of which hasn't been mentioned is celery. Celery is a distributed task queue - you set up a queue of tasks running a DB, Redis or RabbitMQ (you can choose from a number of free software options), and write a number of compute tasks. These can be arbitrary scientific computing type tasks. Tasks can spawn subtasks (implementing your "joining" step you mention above). You then start as many workers as you need and compute away.

I'm a heavy user of Twisted and Celery, so in any case, both options are good.

rlotun
Can you provide some example code, pretty please?
Rince
Well, I'll use the example on the celery website. To mirror the example you have above, you'd first write a number of tasks. A task is essentially your dataWorker: from celery.decorators import task @task def dataWorker(chara)): return ord(chara)You can write as many tasks as you please - and conceptually they're just functions that *do something*. Then, elsewhere - perhaps in your dataServer - you simply schedule the task: result = dataWorker.delay(chara)You can think of the result as a deferred - you can either wait on the result or check on it later.
rlotun
Ok, I forgot that code in comments don't show up well, but essentially check the celery website for a near analogue of what you're trying to do. Remember you have three components: 1) Tasks, which are run by workers 2) A Queue system to hold the tasks 3) A place to store results of tasks. Celery can work with Django seamlessly as well.
rlotun
+1  A: 

It seems to me that you are misunderstanding the fundamentals of how Twisted operates. I recommend you give the Twisted Intro a shot by Dave Peticolas. It has been a great help to me, and I've been using Twisted for years!

HINT: Everything in Twisted relies on the reactor!

The Reactor Loop

jathanism