ansaurus

Question

numpy convert categorical string arrays to an integer array

Answer 1

+2 A:

Well, this is a hack... but does it help?

In [72]: c=(a.view(np.ubyte)-96).astype('int32')

In [73]: print(c,c.dtype)
(array([1, 2, 3, 1, 2, 3]), dtype('int32'))

unutbu 2010-07-03 19:15:51

Answer 2

+1 A:

One way is to use the categorical function from scikits.statsmodels. For example:

In [60]: from scikits.statsmodels.tools import categorical

In [61]: a = np.array( ['a', 'b', 'c', 'a', 'b', 'c'])

In [62]: b = categorical(a, drop=True)

In [63]: b.argmax(1)
Out[63]: array([0, 1, 2, 0, 1, 2])

The return value from categorical (b) is actually a design matrix, hence the call to argmax above to get it close to your desired format.

In [64]: b
Out[64]:
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

ars 2010-07-10 05:12:10

Neat and clever. Thanks.

unutbu 2010-07-10 11:34:04

Answer 3

+1 A:

np.unique has some optional returns

return_inverse gives the integer encoding, which I use very often

>>> b, c = np.unique(a, return_inverse=True)
>>> b
array(['a', 'b', 'c'], 
      dtype='|S1')
>>> c
array([0, 1, 2, 0, 1, 2])
>>> c+1
array([1, 2, 3, 1, 2, 3])

it can be used to recreate the original array from uniques

>>> b[c]
array(['a', 'b', 'c', 'a', 'b', 'c'], 
      dtype='|S1')
>>> (b[c] == a).all()
True

2010-07-14 20:24:54

ansaurus

tags:

views:

answers:

numpy convert categorical string arrays to an integer array

related questions