ansaurus

Question

Rewriting a for loop in pure NumPy to decrease execution time

Answer 1

+2 A:

One obvious thing you can do is replace the line

r_test_fast = reshape_vector(r_test)

with

r_test_fast = r_test.reshape((3,1))

Probably won't make any big difference in performance, but in any case it makes sense to use the numpy builtins instead of reinventing the wheel.

Generally speaking, as you probably have noticed by now, the trick with optimizing numpy is to express the algorithm with the help of numpy whole-array operations or at least with slices instead of iterating over each element in python code. What tends to prevent this kind of "vectorization" is so-called loop-carried dependencies, i.e. loops where each iteration is dependent on the result of a previous iteration. Looking briefly at your code, you have no such thing, and it should be possible to vectorize your code just fine.

EDIT: One solution

I haven't verified this is correct, but should give you an idea of how to approach it.

First, take the cartesian() function, which we'll use. Then


def calculate_dipole_vect(mus, r_i, mom_i):
    # Treat each mu sequentially
    Bs = []
    omega = []
    for mu in mus:
        rel = mu - r_i
        r_norm = np.sqrt((rel * rel).sum(1))
        r_unit =  rel / r_norm[:, np.newaxis]
        A = 1e-7

        num = A*(3*np.sum(mom_i * r_unit, 0)*r_unit - mom_i)
        den = r_norm ** 3
        B = np.sum(num / den[:, np.newaxis], 0)
        Bs.append(B)
        omega.append(gamma_mu * np.sqrt(np.dot(B, B)))
    return Bs, omega


# Transpose to get more "natural" ordering with row-major numpy
r_i = r_i.T
mom_i = mom_i.T

t_start = time.clock()
r_frac = cartesian((np.arange(n[0]) / float(n[0]),
                    np.arange(n[1]) / float(n[1]),
                    np.arange(n[2]) / float(n[2])))
r_test = np.dot(r_frac, a)
B, omega = calculate_dipole_vect(r_test, r_i, mom_i)

print 'Total time for vectorized: %f s' % (time.clock() - t_start)

Well, in my testing, this is in fact slightly slower than the loop-based approach I started from. The thing is, in the original version in the question, it was already vectorized with whole-array operations over arrays of shape (20000, 3), so any further vectorization doesn't really bring much further benefit. In fact, it may worsen the performance, as above, maybe due to big temporary arrays.

janneb 2010-04-07 13:38:17

I think Justin's suggestion to profile was probably wise, but thanks very much for that…though I'm not sure I'll use it, I think trying to understand that example is probably a very good way of learning. :)

Statto 2010-04-07 16:10:52

Answer 2

+2 A:

If you profile your code, you'll see that 99% of the running time is in calculate_dipole so reducing the time for this looping really won't give a noticeable reduction in execution time. You still need to focus on calculate_dipole if you want to make this faster. I tried my Cython code for calculate_dipole on this and got a reduction by about a factor of 2 in the overall time. There might be other ways to improve the Cython code too.

Justin Peel 2010-04-07 15:18:37

ansaurus

tags:

views:

answers:

Rewriting a for loop in pure NumPy to decrease execution time

related questions