I'm writing a data processing library in Python that reads data from a variety of sources into memory, manipulates it, then exports it into a variety of different formats. I was loading this data into memory, but some of the datasets I'm processing can be particularly large (over 4 Gig).
I need an open source library for a backing store that can deal elegantly with large datasets. It needs the ability to alter the data structure dynamically (add, rename, and remove columns), and should support reasonably fast iteration. Ideally, it should be able to handle arbitrary-sized strings and integers (just as python does) but I can build that into the library, if needed. And it needs to be able to handle missing values.
Does anyone have any suggestions?