views:

1500

answers:

8

I'm making a program for running simulations in Python, with a wxPython interface. In the program, you can create a simulation, and the program renders (=calculates) it for you. Rendering can be very time-consuming sometimes.

When the user starts a simulation, and defines an initial state, I want the program to render the simulation continuously in the background, while the user may be doing different things in the program. Sort of like a YouTube-style bar that fills up: You can play the simulation only up to the point that was rendered.

Should I use multiple processes or multiple threads or what? People told me to use the multiprocessing package, I checked it out and it looks good, but I also heard that processes, unlike threads, can't share a lot of information (and I think my program will need to share a lot of information.) Additionally I also heard about Stackless Python: Is it a separate option? I have no idea.

Please advise.

+1  A: 

I always prefer multiple threads for simplicity, but there is a real issue with affinity. There is no way (that I know of) to tell Python's threading implementation to bind to a specific processor. This may not be an issue for you, it doesnt sound like it should be. Unless you have a good reason not to, it sounds like your problem can be solved easily with Python's threading implementation.

If you do decided on using processed, sharing information between sub processes can be accomplished in several ways: tcp/udp connections, shared memory or pipes. It does add some overhead and complexity.

Shane C. Mason
+1: Threading is a very, very natural format for working with event-driven GUIs, and it helps you avoid the pain of inter-process communication (unless your information sharing needs are well-suited to the limited options Shane mentioned).
ojrac
1. Would threads automatically take advantage of all the cores in the CPU?2. Do you have an idea how Stackless fits into all of this?
cool-RR
The thing about threads is that they are 'generally' under the control of the OS, and all of the OSs a pretty good job of distributing the loads across the CPUs. This is generally the behavior that you want. You can imagine scenarios where you would want to bing a single task to a single CPU though.
Shane C. Mason
NO. Python's global interpreter lock mandates that only ONE thread may access the interpreter at a time. So you can't take advantage of multi-core processors using python's threads.
Jason Baker
What Jason says is true, the GIL will not allow concurrent execution on multiple CPUs. I should have been more clear in my statement, the OS decides which CPU it will run on and you will see your application switch CPUs during execution.
Shane C. Mason
Thanks Jason - That's the most important thing for me.
cool-RR
+9  A: 

"I checked it out and it looks good, but I also heard that processes, unlike threads, can't share a lot of information..."

This is only partially true.

Threads are part of a process -- threads share memory trivially. Which is as much of a problem as a help -- two threads with casual disregard for each other can overwrite memory and create serious problems.

Processes, however, share information through a lot of mechanisms. A Posix pipeline (a | b) means that process a and process b share information -- a writes it and b reads it. This works our really well for a lot things.

The operating system will assign your processes to every available core as quickly as you create them. This works out really well for a lot of things.

Stackless Python is unrelated to this discussion -- it's faster and has different thread scheduling. But I don't think threads are the best route for this.

"I think my program will need to share a lot of information."

You should resolve this first. Then determine who to structure processes around the flow of information. A "pipeline" is very easy an natural to do; any shell will create the pipeline trivially.

A "server" is another architecture where multiple client processes get and/or put information into a central server. This is a great way to share information. You can use the WSGI reference implementation as a way to build a simple, reliable server.

S.Lott
+10  A: 
  • Stackless: uses 1 cpu. "Tasklets" must yield voluntarily. The preemption option doesn't work all the time.
  • Threaded: uses 1 cpu. Native threads share time somewhat randomly after running 20-100 python opcodes.
  • Multiprocessing: uses multiple cpu

Update

Indepth Analysis

Use threaded for an easy time. However, if you call C routines that take a long time before returning, then this may not be a choice if your C routine does not release the lock.

Use multiprocessing if it is very limited by cpu power and you need maximum responsiveness.

Don't use stackless, I have had it segfault before and threads are pretty much equivalent unless you are using hundreds of them or more.

Unknown
That's the first time I've ever heard someone say threading was easy. IMO threaded code is very hard to write well.
Bryan Oakley
+4  A: 

With CPython multiple threads can't execute at the same time because of the GIL: link text.

I think it's still possible that threads boost your application, e.g. a thread might block on I/O while another one does some work.

If you've never used threads, I suggest that you try them first. It will be useful in any other language, and you'll find a lot of ressources on the web. Then if you realize that you need more parallelism, you still can switch back to processes.

Bastien Léonard
+5  A: 

A process has its own memory space. It makes it more difficult to share information, but also makes the program safer (less need for explicit synchronization). That being said, processes can share the same memory in read-only mode.

A thread is cheaper to create or kill, but the main difference is that it shares memory with other threads in the same process. This is sometimes risky, and in addition crashing the process would kill all threads.

One advantage of using multiple processes over multiple threads is that it would be easier to scale your program to work with multiple machines that communicate via network protocols.

For example, you could potentially run 16 processes on 8 dual-core machines, but would not have a benefit from more than 4 threads on a quad-core machine. If the amount of information you need to communicate is low, multiprocessing may make more sense.

As for the youtube-style you've described, I would say that suggests multiprocessing. If you follow MVC approaches, your GUI should not also contain the model (calculation result). With multiprocess, you can then communicate to a work manager that can report what data is already available.

Uri
"processes can share the same memory in read-only mode"I think that will be very useful to me. How do I do that?
cool-RR
On most UNIX systems, when you fork a process (create one from the other), they're supposed to share the same read-pages until they write. It saves loading the program code. But it's not that useful as a programming technique.
Uri
Unfortunately, on Windows that's not the case though (windows doesn't have os.fork available).
Jon Cage
+7  A: 

There was a good talk on multiprocessing at Pycon this year. The takeaway message was "Only use multiprocessing unless you're sure you have a problem that it will solve, that cannot be solved with threads; otherwise, use threads."

Processes have a lot of overhead, and all data to be shared among processes must be serializable (ie pickleable).

You can see the slides and video here: http://us.pycon.org/2009/conference/schedule/event/31/

Vicki Laidler
That's unfortunate, as that is almost the opposite of what you'd do in other languages where possible. Threads are error-prone and limited compared to processes, and in Python you get the GIL problem to add insult to injury.
Kylotan
while it is true that multiple processes have a small bit of runtime overhead (though that's much less true than it was five or ten years ago), threaded code has a very large amount of programming overhead. It takes smart people to write good threaded code, and _very_ smart people to debug it.
Bryan Oakley
A: 

It sounds like you'd want threading.

The way you described it, it sounded like there was one single thing that actually took a lot of CPU...the actual running of the simulation.

What you're trying to get is more responsive displays, by allowing user interaction and graphics updates while the simulation is running. This is exactly what python's threading was built for.

What this will NOT get you is the ability to take advantage of multiple cores/processors on your system. I have no idea what your simulation looks like, but if it is that CPU intensive, it might be a good candidate for splitting up. In this case, you can use multiprocessing to run separate parts of the simulation on separate cores/processors. However, this isn't trivial...you now need some way to pass data back and fourth between the processes, as the separate processes can't easily access the same memory space.

teeks99
+3  A: 

If you would like to read a lengthy discussion of multi-threading in Mozilla, consider having a look at this discussion which started in 2000. The discussion doesn't necessarily answer your question. However, it's an indepth discussion that I believe is interesting and informative, which I suggest may be quite valuable because you've asked a difficult question. Hopefully it'll help you make an informed decision.

Incidentally, several members of the Mozilla project (notably Brendan Eich, Mozilla's CTO and the creator of JavaScript) were quite critical of multi-threading in particular. Some of the material referenced here, here, here, and here supports such a conclusion.

Hope that helps and good luck.

Brian M. Hunt