ansaurus

Question

Answer 1

+1 A:

Look into numpy.unique and numpy.bincount.

E.g.

import numpy as np
x = (np.random.random(100) * 5).astype(np.int)
unique_vals, indicies = np.unique(x, return_inverse=True)
counts = np.bincount(indicies)

print unique_vals, counts

Edit: Sorry, I misread your question...

One way to get the unique rows is to view things as a structured array...

In your case, you have a 2D array of bools. So maybe something like this?

import numpy as np
numrows, numcols = 10,3
x = np.random.random((numrows, numcols)) > 0.5
x = x.view(','.join(numcols * ['i1'])) # <- View the rows as a 1D structured array...

unique_vals, indicies = np.unique(x, return_inverse=True)
counts = np.bincount(indicies)

print unique_vals, counts

Of course, there's nothing really wrong with the way you were originally doing it... Just to show a slightly cleaner way to write your original function (Using tuples, as Justin suggested):

def unique_rows(data):
    unique = dict()
    for row in data:
        row = tuple(row)
        if row in unique:
            unique[row] += 1
        else:
            unique[row] = 1
    return unique

We can take this one step farther and use a defaultdict:

from collections import defaultdict
def unique_rows(data):
    unique = defaultdict(int)
    for row in data:
        unique[tuple(row)] += 1
    return unique

As it happens, either of these options appears to be faster than the "numpy-thonic" way of doing it... (I would have guessed the opposite! Converting the rows to strings as you did in your original example is slow, though. You definitely want to compare tuples instead of strings).

Joe Kington 2010-10-13 01:28:11

ansaurus

tags:

views:

answers:

comparing row in numpy array

related questions