tags:

views:

45

answers:

1

I'd like to convert a list of record arrays -- dtype is (uint32, float32) -- into a numpy array of dtype np.object:

X = np.array(instances, dtype = np.object)

where instances is a list of arrays with data type np.dtype([('f0', '<u4'), ('f1', '<f4')]). However, the above statement results in an array whose elements are also of type np.object:

X[0]
array([(67111L, 1.0), (104242L, 1.0)], dtype=object)

Does anybody know why?

The following statement should be equivalent to the above but gives the desired result:

X = np.empty((len(instances),), dtype = np.object)
X[:] = instances
X[0]
array([(67111L, 1.0), (104242L, 1.0), dtype=[('f0', '<u4'), ('f1', '<f4')])

thanks & best regards, peter

+1  A: 

Stéfan van der Walt (a numpy developer) explains:

The ndarray constructor does its best to guess what kind of data you are feeding it, but sometimes it needs a bit of help....

I prefer to construct arrays explicitly, so there is no doubt what is happening under the hood:

When you say something like

instance1=np.array([(67111L,1.0),(104242L,1.0)],dtype=np.dtype([('f0', '<u4'), ('f1', '<f4')]))
instance2=np.array([(67112L,2.0),(104243L,2.0)],dtype=np.dtype([('f0', '<u4'), ('f1', '<f4')]))
instances=[instance1,instance2]
Y=np.array(instances, dtype = np.object)

np.array is forced to guess what is the dimension of the array you desire. instances is a list of two objects, each of length 2. So, quite reasonably, np.array guesses that Y should have shape (2,2):

print(Y.shape)
# (2, 2)

In most cases, I think that is what would be desired. However, in your case, since this is not what you desire, you must construct the array explicitly:

X=np.empty((len(instances),), dtype = np.object)
print(X.shape)
# (2,)

Now there is no question about X's shape: (2, ) and so when you feed in the data

X[:] = instances

numpy is smart enough to regard instances as a sequence of two objects.

unutbu