views:

47

answers:

1

I am doing some research this summer and working on parallelizing pre-existing code. The main focus right now is a way to load balance the code so that it will run more efficient on the cluster. The current task is to make a proof of concept that creates several processes with each one having their own stack available and when the process is finished processing the stack it queries the two closest processes to see if they have any more work available in their stack.

I am having difficulties conceptualizing this in python, but was hoping someone could point me in the right direction or have some sort of example that is similar in either mpi4py or ParallelPython. Also if anyone knows of a better or easier module then that would be great to know.

Thanks.

+1  A: 

Here's a simple way to do this.

  1. Create a single common shared queue of work to do. This application will fill this queue with work to do.

  2. Create an application which gets one item from the queue, and does the work.

This is the single-producer-multiple-consumer design. It works well and can swamp your machine with parallel processes.

To use the built-in queue class, you need to wrap the queue in some kind of multi-processing API. http://docs.python.org/library/queue.html. Personally, I like to create a small HTTP-based web-server that handles the queue. Each application does a GET to fetch the next piece of work.

You can use tools like RabbitMQ to create a very nice shared queue. http://nathanborror.com/posts/2009/may/20/working-django-and-rabbitmq/

You might be able to use http://hjb.python-hosting.com/ to make use of JMS queues.

You'll need a small application to create and fill the queue with work.

Create as many copies of the application as you like. For example:

for i in 1 2 3 4 5 6 7 8 9 10
do
    python myapp.py &
done

This will run 10 concurrent copies of your application. All 10 are trying to get work from a single queue. They will use all available CPU resources and the OS will nicely schedule them for you.


Peer, node-to-node synchronization means you have O(n*(n-1)/2) communication paths among all nodes.

The "two-adjacent nodes" means you still have 2*n communication paths and work has to "somehow" trickle among the nodes. If the nodes are all initially seeded with work, then someone did a lot of planning to balance the workload. If you're going to do that much planning, why ask the nodes to synchronize at all?

If the queues are not carefully balanced to begin with than every even node could be slow. Every odd node could be fast. The odd nodes finish first, check for work from two even nodes, and those nodes are (a) not done and (b) don't have more work to do, either. What now? Half the nodes are working, half are idle. All due to poor planning in the initial distribution of work.

Master-slave means you have n communication paths. Further, the balancing is automatic since all idle nodes have equal access to work. There's no such thing as a biased initial distribution that leads to poor overall performance.

S.Lott
Thank you, this is similar to the current design of the program, but they are wanting to have a node to node structure rather than a master-slave structure. Also would having a single queue being accessed many times from multiple processes create a possible bottleneck on a cluster?
DistortedLojik
@DistortedLojik: Single queue should not be a problem unless each work packet is microscopically small. It's pure overhead so you want the cost of queue access to be amortized over a lot of high-value work.
S.Lott