views:

63

answers:

3

I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell has been selected it can't be selected again, but all cells must be selected by the end).

I'm transitioning from IDL where I can find a nice way to do this, but I assume that NumPy has a nice way to do this too. What would you suggest?

Update: As mentioned in my comment below, I should have stated that I'm trying to do this on 2D arrays, and therefore get a set of 2D indices back. Sorry!

+3  A: 

How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?

If you need to change the array in-place than you can create an index array like this:

your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)

print your_array[index_array[:10]]
WoLpH
Thanks for your answer. Looks like I should have mentioned in my question that this is 2D array...and I'd like to get the 2D array indices out for every cell, randomly without replacement. Is there a way to do that easily? I
robintw
@robintw - `numpy.random.shuffle` should work perfectly on n-dimensional arrays. If you want the indicies, you might try making row and column index arrays (look into `meshgrid`) and then shuffling them.
Joe Kington
@robintw: 2D arrays are no problem either, you can simply `reshape()` to get 2D instead of 1D :)
WoLpH
+1  A: 

Extending the nice answer from @WoLpH

For a 2D array I think it will depend on what you want or need to know about the indices.

You could do something like this:

data = np.arange(25).reshape((5,5))

x, y  = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)

OR

data = np.arange(25).reshape((5,5))

grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)

You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.

Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.

dtlussier
A: 

Use random.sample to generates ints in 0 .. A.size with no duplicates, then split them to index pairs:

import random
import numpy as np

def randint2_nodup( nsample, A ):
    """ uniform int pairs, no dups:
        r = randint2_nodup( nsample, A )
        A[r]
        for jk in zip(*r):
            ... A[jk]
    """
    assert A.ndim == 2
    sample = np.array( random.sample( xrange( A.size ), nsample ))  # nodup ints
    return sample // A.shape[1], sample % A.shape[1]  # pairs


if __name__ == "__main__":
    import sys

    nsample = 8
    ncol = 5
    exec "\n".join( sys.argv[1:] )  # run this.py N= ...
    A = np.arange( 0, 2*ncol ).reshape((2,ncol))

    r = randint2_nodup( nsample, A )
    print "r:", r
    print "A[r]:", A[r]
    for jk in zip(*r):
        print jk, A[jk]
Denis