Python is very nice for such kind of data processing, especially if your samples are "rows" and you can process each such row independently:
row1
row2
row3
etc.
In fact your program can have very small memory footprint, thanks to generators and generator expressions, about which you can read here: http://www.dabeaz.com/generators/ (it's not basic stuff but some mind-twisting applications of generators).
Regarding S.Lott's answer, you probably want to avoid filter() being applied to sequence of rows - it might explode your computer if you pass to it sequence that is long enough (try: filter(None, itertools.count())
- after saving all you data :-)). It's much better to replace filter
with something like this:
def filter_generator(func, sequence):
for item in sequence:
if (func is None and item) or func(item):
yield item
or shorter:
filtered_sequence = (item for item in sequence if (func is None and item) or func(item))
This can be further optimized by extracting condition before the loop, but this is an excersise for the reader :-)