views:

316

answers:

3

In pure python you can grow matrices column by column pretty easily:

data = []
for i in something:
    newColumn = getColumnDataAsList(i)
    data.append(newColumn)

numpy's array doesn't have the append function. The hstack function doesn't work on zero sized arrays, thus the following won't work:

data = numpy.array([])
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    data = numpy.hstack((data, newColumn)) # ValueError: arrays must have same number of dimensions

So, my options are either to remove the initalization iside the loop with appropriate condition:

data = None
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    if data is None:
        data = newColumn
    else:
        data = numpy.hstack((data, newColumn)) # works

... or to use a python list and convert is later to array:

 data = []
 for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    data.append(newColumn)
 data = numpy.array(data)

Both variants seem a little bit awkward to be. Are there nicer solutions?

A: 

Generally it is expensive to keep reallocating the numpy array - so your third solution is really the best performance wise.

However I think hstack will do what you want - the cue is in the error message # ValueError: arrays must have same number of dimensions - I'm guessing that newColumn has two dimensions (rather than a 1d vector), so you need data to also have two dimensions... eg data = np.array([[]]) - or alternatively make newColumn a 1d vector (generally if things are 1d it is better to keep them 1d in numpy so broadcasting etc. work better). in which case use np.squeeze(newColumn) and hstack or vstack should work with your original definition of data.

thrope
+2  A: 

Usually you don't keep resizing a numpy array when you create it. What don't you like about your third solution? If it's a very large matrix/array, then it might be worth allocating the array before you start assigning it's values:

x = len(something)
y = getColumnDataAsNumpyArray.someLengthProperty

data = numpy.zeros( (x,y) )
for i in something:
   data[i] = getColumnDataAsNumpyArray(i)
bpowah
+1  A: 

Numpy actually does has an append function, which it seems might do what you want, e.g.,

import numpy as NP
my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)
new_col = NP.array((5, 5, 5)).reshape(3, 1)
res = NP.append(my_data, new_col, axis=1)

Your second snippet (hstack) will work if you add another line, e.g.,

my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
# the line to add--does not depend on array dimensions
new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
res = NP.hstack((my_data, new_col))

hstack gives the same result as concatenate((my_data, new_col), axis=1), not sure if they are same performance-wise.

doug