views:

152

answers:

2

I believe I am having a memory issue using numpy arrays. The following code is being run for hours on end:

    new_data = npy.array([new_x, new_y1, new_y2, new_y3])
    private.data = npy.row_stack([private.data, new_data])

where new_x, new_y1, new_y2, new_y3 are floats.

After about 5 hours of recording this data every second (more than 72000 floats), the program becomes unresponsive. What I think is happening is some kind of realloc and copy operation that is swamping the process. Does anyone know if this is what is happening?

I need a way to record this data without encountering this slowdown issue. There is no way to know even approximately the size of this array beforehand. It does not necessarily need to use a numpy array, but it needs to be something similar. Does anyone know of a good method?

+1  A: 

Update: I incorporated @EOL's excellent indexing suggestion into the answer.

The problem might be the way row_stack grows the destination. You might be better off handling the reallocation yourself. The following code allocates a big empty array, fills it, and grows it as it fills an hour at a time

numcols = 4
growsize = 60*60 #60 samples/min * 60 min/hour
numrows = 3*growsize #3 hours, to start with
private.data = npy.zeros([numrows, numcols]) #alloc one big memory block
rowctr = 0
while (recording):
    private.data[rowctr] = npy.array([new_x, new_y1, new_y2, new_y3])
    rowctr += 1
    if (rowctr == numrows): #full, grow by another hour's worth of data
        private.data = npy.row_stack([private.data, npy.zeros([growsize, numcols])])
        numrows += growsize

This should keep the memory manager from thrashing around too much. I tried this versus row_stack on each iteration and it ran a couple of orders of magnitude faster.

mtrw
Good idea. `npy.empty` seems more appropriate than `npy.zeros` (and is probably a tad faster).
EOL
This is really fast. Encapsulating this in a class with a row_stack method would be nice.
EOL
Note that `private.data[rowctr] = …` is much faster than `[rowctr, :]`.
EOL
@EOL - thanks for the suggestions! I didn't realize you could index a whole row at a time like that. And it is much faster.
mtrw
@EOL - In my testing, it looks like `npy.zeros` is marginally (~3%) faster than `npy.empty` so I switched back to the former. But the indexing change you suggested made a 20% speed improvement.
mtrw
+1  A: 

Use Python lists. Seriously, they grow far more efficiently. This is what they are designed for. They are remarkably efficient in this setting.

If you need to create an array out of them at the end (or even occasionally in the midst of this computation), it will be far more efficient to accumulate in a list first.

dwf