views:

120

answers:

1

Given two arrays hashes and table, for each value in hashes I want to store the position of the element at the element's value's offset in the array table. Here is the naïve algorithm:

def insert_n(table,hashes):
    for x in xrange(len(hashes)):
        table[hashes[x]]=x

This is extremely slow. Psyco helps some here, but hardly.

Numpy has a solution:

numpy.insert(table,numpy.arange(len(hashes)),hashes)

But according to my benchmarks, this is still extremely slow for such a simple operation. Is there a faster way to perform this that can be used from python?

Some additional example code:

import numpy

from time import time

table_size=2**20

hashes_size=2**19

table=numpy.zeros(table_size,dtype=numpy.uint32)

hashes=numpy.fromstring(numpy.random.bytes((hashes_size)*4),
                        dtype=numpy.uint32)%table_size

t0=time()

numpy.insert(table,numpy.arange(len(hashes)),hashes)

print time()-t0
+2  A: 

This is fast and simple (assuming table and hashes are numpy.uint32 arrays):

table[hashes] = numpy.arange(len(hashes), dtype=numpy.uint32)

You may want to compare the speed with this:

table[hashes] = xrange(len(hashes))

By the way, numpy.insert does not do the same thing as the for-loop you posted.

interjay
Completely right, I was using insert when I meant to use put (which is equivalent to your slicing assignment.) You got it.