ansaurus

Question

Answer 1

+2 A:

I don't know about faster, but I'd suggest:

sum(i*j for i, j in zip(v1, v2))

it's much easier to read and doesn't require even standard-library modules.

SilentGhost 2009-12-01 19:53:42

@SilentGhost: your approach takes much longer. For N=10 it took 18.0258 seconds (one million runs). What I am looking for is speed; indeed readability is a non-issue, since the dot product is called from a function (udotv=dot(u,v)), and I can comment the code as much as I need to in the definition of dot. Your approach really is not appropriate.

Arrieta 2009-12-01 20:01:00

@SilentGhost, a quick ovservation:changing zip to itertools.izip reduces the time to 15.84879. Maybe worth knowing?

Arrieta 2009-12-01 20:10:12

if performance is such a big deal, write it in C

SilentGhost 2009-12-01 20:19:24

Sorry, SilentGhost. I think you are missing the point.

Arrieta 2009-12-01 20:23:45

This is definitely what I would do. I throw in psyco for performance on Windows if that is an issue.

hughdbrown 2009-12-01 22:33:14

No Psyco: 18.6840143091. With Psyco: 25.0433867992. This must be one of those "worst case" optimizations for psyco for some reason. Using izip() (without psyco) only knocked it down to 17.4570938485.

Seth 2009-12-08 03:54:20

Answer 2

+3 A:

Just for fun I wrote a "d4" which uses numpy:

from numpy import dot
def d4(v1, v2): 
    check(v1, v2)
    return dot(v1, v2)

My results (Python 2.5.1, XP Pro sp3, 2GHz Core2 Duo T7200):

d0 elapsed:  12.1977242918
d1 elapsed:  13.885232341
d2 elapsed:  13.7929552499
d3 elapsed:  11.0952246724

~~d4 elapsed: 56.3278584289 # go numpy!~~

And, for even more fun, I turned on psyco:

d0 elapsed:  0.965477735299
d1 elapsed:  12.5354792299
d2 elapsed:  12.9748163524
d3 elapsed:  9.78255448667

~~d4 elapsed: 54.4599059378~~

Based on that, I declare d0 the winner :)

Update

@kaiser.se: I probably should have mentioned that I did convert everything to numpy arrays first:

from numpy import array
v3 = [array(vec) for vec in v1]
v4 = [array(vec) for vec in v2]

# then
t4 = timeit.Timer("d4(v3,v4)","from dot_product import d4,v3,v4")

And I included check(v1, v2) since it's included in the other tests. Leaving it off would give numpy an unfair advantage (though it looks like it could use one). The array conversion shaved off about a second (much less than I thought it would).

All of my tests were run with N=50.

@nikow: I'm using numpy 1.0.4, which is undoubtedly old, it's certainly possible that they've improved performance over the last year and a half since I've installed it.

Update #2

@kaiser.se Wow, you are totally right. I must have been thinking that these were lists of lists or something (I really have no idea what I was thinking ... +1 for pair programming).

How does this look:

v3 = array(v1)
v4 = array(v2)

New results:

d4 elapsed:  3.22535741274

With Psyco:

d4 elapsed:  2.09182619579

d0 still wins with Psyco, but numpy is probably better overall, especially with larger data sets.

Yesterday I was a bit bothered my slow numpy result, since presumably numpy is used for a lot of computation and has had a lot of optimization. Obviously though, not bothered enough to check my result :)

Seth 2009-12-01 20:12:59

Great findings, Seth! First, it is incredible that Numpy is so slow! I would expect it to be much faster. Also, I had no clue about Psyco (and I considered myself a Python junkie!) - thanks for pointing it out, I will definitively check it out. Finally, it is interesting to see that, basically, the Psyco thing made pure optimized C code for d0 and did not know how to optimize d3. I guess the message is that, if you want to use Psyco, you should lay out the algorithm so it can be optimized (as opposed to "hide" its logic behind Python constructs). Again, great findings!

Arrieta 2009-12-01 20:21:59

Maybe something is wrong with your numpy install. On my machine numpy is much faster than the other options (I didn't try psyco). And N=50 is a little small for numpy to show its strength.

nikow 2009-12-01 20:57:48

you're doing it wrong. make numpy arrays once (instead of passing lists that will be converted by numpy *each time*), and numpy will be much faster. also drop the check.

kaizer.se 2009-12-01 22:07:25

you're doing it **extremely** wrong. You are passing a list to numpy. A list of single-element numpy arrays, in fact.

kaizer.se 2009-12-02 15:22:34

Thanks for the update! Just another example that it is hard to use numpy correctly.

kaizer.se 2009-12-04 17:44:16

Answer 3

+1 A:

Here is a comparison with numpy. We compare the fast starmap approach with numpy.dot

First, iteration over normal Python lists:

$ python -mtimeit "import numpy as np; r = range(100)" "np.dot(r,r)"
10 loops, best of 3: 316 usec per loop

$ python -mtimeit "import operator; r = range(100); from itertools import izip, starmap" "sum(starmap(operator.mul, izip(r,r)))"
10000 loops, best of 3: 81.5 usec per loop

Then numpy ndarray:

$ python -mtimeit "import numpy as np; r = np.arange(100)" "np.dot(r,r)"
10 loops, best of 3: 20.2 usec per loop

$ python -mtimeit "import operator; import numpy as np; r = np.arange(100); from itertools import izip, starmap;" "sum(starmap(operator.mul, izip(r,r)))"
10 loops, best of 3: 405 usec per loop

Seeing this, it seems numpy on numpy arrays is fastest, followed by python functional constructs working with lists.

kaizer.se 2009-12-02 15:25:32

Answer 4

A:

Please benchmark this "d2a" function, and compare it to your "d3" function.

from itertools import imap, starmap
from operator import mul

def d2a(v1,v2):
    """
    d2a uses itertools.imap
    """
    check(v1,v2)
    return sum(imap(mul, v1, v2))

map (on Python 2.x, which is what I assume you use) unnecessarily creates a dummy list prior to the computation.

ΤΖΩΤΖΙΟΥ 2009-12-23 15:20:21

ansaurus

tags:

views:

answers:

Optimized dot product in Python

related questions