views:

461

answers:

4

I'm trying to use Python and Numpy/Scipy to implement an image processing algorithm. The profiler tells me a lot of time is being spent in the following function (called often), which tells me the sum of square differences between two images

def ssd(A,B):
    s = 0
    for i in range(3):
        s += sum(pow(A[:,:,i] - B[:,:,i],2))
    return s

How can I speed this up? Thanks.

+4  A: 

Just

s = numpy.sum((A[:,:,0:3]-B[:,:,0:3])**2)

(which I expect is likely just sum((A-B)**2) if the shape is always (,,3))

You can also use the sum method: ((A-B)**2).sum()

Right?

Andrew Jaffe
Bang. I'm having a slow day. Halves my running time.
Rock and or Roll
It's worth noting that for this you will have to use `numpy.sum`, not the builtin `sum`, which will find the sum over the first dimension and return a new array of one-dimension-lower.
Mike Graham
((A-B)**2).sum(-1) If you only want to add over the last axis, then the axis argument needs to be specified. Just using sum() adds all entries of the array (raveled first)
A: 

I do not know if the pow() function with power 2 will be fast. Try:

def ssd(A,B):
    s = 0
    for i in  range(3):
        s += sum((A[:,:,i] - B[:,:,i])*A[:,:,i] - B[:,:,i])
    return s
Ritsaert Hornstra
+1  A: 

I am confused why you are taking i in range(3). Is that supposed to be the whole array, or just part?

Overall, you can replace most of this with operations defined in numpy:

def ssd(A,B):
    squares = (A[:,:,:3] - B[:,:,:3]) ** 2
    return numpy.sum(squares)

This way you can do one operation instead of three and using numpy.sum may be able to optimize the addition better than the builtin sum.

Mike Graham
+1 This is how `scipy.stats.stats.ss` (sum of squares) does it.
unutbu
A: 

Further to Ritsaert Hornstra's answer that got 2 negative marks (admittedly I didn't see it in it's original form...)

This is actually true.

For a large number of iterations it can often take twice as long to use the '**' operator or the pow(x,y) method as to just manually multiply the pairs together. If necessary use the math.fabs() method if it's throwing out NaN's (which it sometimes does especially when using int16s etc.), and it still only takes approximately half the time of the two functions given.

Not that important to the original question I know, but definitely worth knowing.

Duncan Tait