views:

136

answers:

2

I'm looking for a fast way to interconvert between linear and multidimensional indexing in Numpy.

To make my usage concrete, I have a large collection of N particles, each assigned 5 float values (dimensions) giving an Nx5 array. I then bin each dimension using numpy.digitize with an appropriate choice of bin boundaries to assign each particle a bin in the 5 dimensional space.

N = 10
ndims = 5
p = numpy.random.normal(size=(N,ndims))
for idim in xrange(ndims):
    bbnds[idim] = numpy.array([-float('inf')]+[-2.,-1.,0.,1.,2.]+[float('inf')])

binassign = ndims*[None]
for idim in xrange(ndims):
    binassign[idim] = numpy.digitize(p[:,idim],bbnds[idim]) - 1

binassign then contains rows that correspond to the multidimensional index. If I then want to convert the multidimensional index to a linear index, I think I would want to do something like:

linind = numpy.arange(6**5).reshape(6,6,6,6,6)

This would give a look-up for each multidimensional index to map it to a linear index. You could then go back using:

mindx = numpy.unravel_index(x,linind.shape)

Where I'm having difficulties is figuring out how to take binassign (the Nx5 array) containing the multidimensional index in each row, and coverting that to an 1d linear index, by using it to slice the linear indexing array linind.

If anyone has a one (or several) line indexing trick to go back and forth between the multidimensional index and the linear index in a way that vectorizes the operation for all N particles, I would appreciate your insight.

+1  A: 

You can simply calculate the index of each bin:

box_indices = numpy.dot(ndims**numpy.arange(ndims), binassign)

The scalar product simply does 1*x0 + 5*x1 + 5*5*x2 +… This is done very efficiently through NumPy's dot().

EOL
Thanks, I generalized your suggestion in my own solution.
Josh
+1  A: 

Although I very much like EOL's answer, I wanted to generalize it a bit for non-uniform numbers of bins along each direction, and also to highlight the differences between C and F styles of ordering. Here is an example solution:

ndims = 5
N = 10

# Define bin boundaries 
binbnds = ndims*[None]
nbins = []
for idim in xrange(ndims):
    binbnds[idim] = numpy.linspace(-10.0,10.0,numpy.random.randint(2,15))
    binbnds[idim][0] = -float('inf')
    binbnds[idim][-1] = float('inf')
    nbins.append(binbnds[idim].shape[0]-1)

nstates = numpy.cumprod(nbins)[-1]

# Define variable values for N particles in ndims dimensions
p = numpy.random.normal(size=(N,ndims))

# Assign to bins along each dimension
binassign = ndims*[None]
for idim in xrange(ndims):
    binassign[idim] = numpy.digitize(p[:,idim],binbnds[idim]) - 1

binassign = numpy.array(binassign)

# multidimensional array with elements mapping from multidim to linear index
# Two different arrays for C vs F ordering
linind_C = numpy.arange(nstates).reshape(nbins,order='C')
linind_F = numpy.arange(nstates).reshape(nbins,order='F')

and now make the conversion

# Fast conversion to linear index
b_F = numpy.cumprod([1] + nbins)[:-1]
b_C = numpy.cumprod([1] + nbins[::-1])[:-1][::-1]

box_index_F = numpy.dot(b_F,binassign)
box_index_C = numpy.dot(b_C,binassign)

and to check for correctness:

# Check
print 'Checking correct mapping for each particle F order'
for k in xrange(N):
    ii = box_index_F[k]
    jj = linind_F[tuple(binassign[:,k])]
    print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)

print 'Checking correct mapping for each particle C order'
for k in xrange(N):
    ii = box_index_C[k]
    jj = linind_C[tuple(binassign[:,k])]
    print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)

And for completeness, if you want to go back from the 1d index to the multidimensional index in a fast, vectorized-style way:

print 'Convert C-style from linear to multi'
x = box_index_C.reshape(-1,1)
bassign_rev_C = x / b_C % nbins 

print 'Convert F-style from linear to multi'
x = box_index_F.reshape(-1,1)
bassign_rev_F = x / b_F % nbins

and again to check:

print 'Check C-order'
for k in xrange(N):
    ii = tuple(binassign[:,k])
    jj = tuple(bassign_rev_C[k,:])
    print ii==jj,ii,jj

print 'Check F-order'
for k in xrange(N):
    ii = tuple(binassign[:,k])
    jj = tuple(bassign_rev_F[k,:])
    print ii==jj,ii,jj 
Josh