According to a section in this presumably accurate book,
A common use of pipes is to read a compressed file incrementally; that is, without uncompressing the whole thing at once. The following function takes the name of a compressed file as a parameter and returns a pipe that uses gunzip to decompress the contents:
def open_gunzip(filename): cmd = 'gunzip -c ' + filename fp = os.popen(cmd) return fp
If you read lines from fp one at a time, you never have to store the uncompressed file in memory or on disk.
Maybe I'm just interpreting this wrong, but I don't see how this is possible. Python couldn't have any means of pausing gunzip halfway through spitting out the results, right? I assume gunzip isn't going to block until a line of output is read before continuing to output more lines, so some buffer has to be capturing all of this (whether inside the Python interpreter or in the OS, whether in memory or on disk), meaning the uncompressed file is being stored somewhere in full...right?