ansaurus

Question

Speeding up computations with numpy matrices

Answer 1

+1 A:

I can think of a couple of optimisations there - As you are using 4 nested python "for" statements, you are about as slow as you can be.

I can't figure out exactly what you are looking for - but for one thing, if your big matrix "1"s density is low, you can certainly use python's "any" function on bigMtarix's slices to quickly check if there are any set elements there -- you could get a several-fold speed increase there:

step = len(smallMatrix[0])
for y in xrange(0, len(bigMatrix[0], step)):
    for x in xrange(0, len(bigMatrix), step):
        if not any(bigMatrix[x: x+step, y: y + step]):
            continue
        (...)

At this point, if still need to interact on each element, you do another pair of indexes to walk each position inside the step - but I think you got the idea.

Apart from using inner Numeric operations like this "any" usage, you could certainly add some control flow code to break-off the (b,a) loop when the first matching pixel is found. (Like, inserting a "break" statement inside your last "if" and another if..break pair for the "b" loop.

I really can't figure out exactly what your intent is - so I can't give you more specifc code.

jsbueno 2009-11-26 13:36:49

I don't know how I could forget what you mention in your last paragraph, that break idea. But I need to break out of 2 loops. Is there any way to do it in python without having to break the inner loop and having to use a flag to check whenever I should break the outter one?

devoured elysium 2009-11-26 13:54:52

Answer 2

+1 A:

Your example code makes no sense, but the description of your problem sounds like you are trying to do a 2d convolution of a small bitarray over the big bitarray. There's a convolve2d function in scipy.signal package that does exactly this. Just do convolve2d(bigMatrix, smallMatrix) to get the result. Unfortunately the scipy implementation doesn't have a special case for boolean arrays so the full convolution is rather slow. Here's a function that takes advantage of the fact that the arrays contain only ones and zeroes:

import numpy as np

def sparse_convolve_of_bools(a, b):
    if a.size < b.size:
        a, b = b, a
    offsets = zip(*np.nonzero(b))
    n = len(offsets)
    dtype = np.byte if n < 128 else np.short if n < 32768 else np.int
    result = np.zeros(np.array(a.shape) + b.shape - (1,1), dtype=dtype)
    for o in offsets:
        result[o[0]:o[0] + a.shape[0], o[1]:o[1] + a.shape[1]] += a
    return result

On my machine it runs in less than 9 seconds for a 3000x2000 by 20x20 convolution. The running time depends on the number of ones in the smaller array, being 20ms per each nonzero element.

Ants Aasma 2009-11-26 14:59:43

I don't get what that code does, but I've tried and it seems to crash saying result = numpy.zeros(numpy.array(a.shape) + b.shape - (1,1), dtype=dtype)ValueError: shape mismatch: objects cannot be broadcast to a single shape

devoured elysium 2009-11-26 15:53:52

Try substituting it with result = np.zeros((a.shape[0] + b.shape[0] - 1, a.shape[1] + b.shape[1] - 1, dtype=dtype)

dwf 2009-11-30 05:20:48

Answer 3

A:

If your bits are really packed 8 per byte / 32 per int, and you can reduce your smallMatrix to 20x16,
then try the following, here for a single row.
(newMatrix[x, y] = 1 when any bit of the 20x16 around x,y is 1 ?? What are you really looking for ?)

python -m timeit -s '
""" slide 16-bit mask across 32-bit pairs bits[j], bits[j+1] """

import numpy as np

bits = np.zeros( 2000 // 16, np.uint16 )  # 2000 bits
bits[::8] = 1
mask = 32+16
nhit = 16 * [0]

def hit16( bits, mask, nhit ):
    """
        slide 16-bit mask across 32-bit pairs bits[j], bits[j+1]
        bits: long np.array( uint16 )
        mask: 16 bits, int
        out: nhit[j] += 1 where pair & mask != 0
    """
    left = bits[0]
    for b in bits[1:]:
        pair = (left << 16) | b
        if pair:  # np idiom for non-0 words ?
            m = mask
            for j in range(16):
                if pair & m:
                    nhit[j] += 1
                    # hitposition = jb*16 + j
                m <<= 1
        left = b
    # if any(nhit):  print "hit16:", nhit

' \
'
hit16( bits, mask, nhit )
'

# 15 msec per loop, bits[::4] = 1
# 11 msec per loop, bits[::8] = 1
# mac g4 ppc

Denis 2009-12-04 15:39:12

ansaurus

tags:

views:

answers:

Speeding up computations with numpy matrices

related questions