views:

95

answers:

2

I have decided to learn how multi-threading is done in Python, and I did a comparison to see what kind of performance gain I would get on a dual-core CPU. I found that my simple multi-threaded code actually runs slower than the sequential equivalent, and I cant figure out why.

The test I contrived was to generate a large list of random numbers and then print the maximum

from random import random
import threading

def ox():
    print max([random() for x in xrange(20000000)])

ox() takes about 6 seconds to complete on my Intel Core 2 Duo, while ox();ox() takes about 12 seconds.

I then tried calling ox() from two threads to see how fast that would complete.

def go():
    r = threading.Thread(target=ox)
    r.start()
    ox()

go() takes about 18 seconds to complete, with the two results printing within 1 second of eachother. Why should this be slower?

I suspect ox() is being parallelized automatically, because I if look at the Windows task manager performance tab, and call ox() in my python console, both processors jump to about 75% utilization until it completes. Does Python automatically parallelize things like max() when it can?

+3  A: 
  1. Python has the GIL. Python bytecode will only be executed by a single processor at a time. Only certain C modules (which don't manage Python state) will be able to run concurrently.
  2. The Python GIL has a huge overhead in locking the state between threads. There are fixes for this in newer versions or in development branches - which at the very least should make multi-threaded CPU bound code as fast as single threaded code.

You need to use a multi-process framework to parallelize with Python. Luckily, the multiprocessing module which ships with Python makes that fairly easy.

Very few languages can auto-parallelize expressions. If that is the functionality you want, I suggest Haskell (Data Parallel Haskell)

Yann Ramin
It's worth noting that threading is still useful in Python for certain types of task, such as I/O-bound operations.
Forest
A: 

The problem is in function random() If you remove random from you code. Both cores try to access to shared state of the random function. Cores work consequentially and spent a lot of time on caches synchronization. Such behavior is known as false sharing. Read this article False Sharing

Andrew