views:

60

answers:

1

I have a Numpy array that looks like

>>> a
array([[ 3. ,  2. , -1. ],
       [-1. ,  0.1,  3. ],
       [-1. ,  2. ,  3.5]])

I would like to select a value from each row at random, but I would like to exclude the -1 values from the random sampling.

What I do currently is:

x=[]
for i in range(a.shape[0]):
    idx=numpy.where(a[i,:]>0)[0]
    idxr=random.sample(idx,1)[0]
    xi=a[i,idxr]
    x.append(xi)

and get

>>> x
[3.0, 3.0, 2.0]

This is becoming a bit slow for large arrays and I would like to know if there is a way to conditionally select random values from the original a matrix without dealing with each row individually.

+2  A: 

I really don't think that you will find anything in Numpy that does exactly what you are asking as packaged so I've decided to offer what optimizations I could think up.

There are several things that could make this slow here. First off, numpy.where() is rather slow because it has to check every value in the sliced array (the slice is generated for each row as well) and then generate an array of values. The best thing that you could do if you plan on doing this process over and over again on the same matrix would be to sort each row. Then you would just use a binary search to find where the positive values start and just use a random number to select a value from them. Of course, you could also just store the indices where the positive values start after finding them once with binary searches.

If you don't plan on doing this process many times over, then I would recommend using Cython to speed up the numpy.where line. Cython would allow you to not need to slice the rows out and speed up the process overall.

My last suggestion is to use random.choice rather than random.sample unless you really do plan on choosing sample sizes that are larger than 1.

Justin Peel
I'll be doing this process on similar but newly generated arrays many times over, so I'll look into Cython. Thanks!
fideli