views:

40

answers:

2

Dear All,

In using the numpy.darray, I met a memory overflow problem due to the size of data,for example:

Suppose I have a 100000000 * 100000000 * 100000000 float64 array data source, when I want to read data and process it in memory with np. It will raise a Memoray Error because it works out all memory for storing such a big array in memory.

Then maybe using a disk file / database as a buffer to store the array is a solution, when I want to use data, it will get the necessary data from the file / database, otherwise, it is just a python object take few memory.

Is it possible write such a adapter?

Thanks.

Rgs, KC

A: 

If You have matrices with lots of zeros use scipy.sparse.csc_matrix. It's possible to write everything, for example You can override numarray array class.

iddqd
thanks, that may not what I need, what I want to do is something like linux command "cat big_file | more" then I can usin page function to seek from different pages.
K. C
+1  A: 

Take a look at pytables or numpy.memmap, maybe they fit your needs.

best, Peter

Peter Prettenhofer
thanks, however, they are not, I need to write a sqlite based numpy.array reader for the SQL language, the most suitable is ATpy, however, it can not "lazy read"...
K. C