views:

469

answers:

3

I want to emulate the functionality of gzcat | tail -n.

This would be helpful for times when there are huge files (of a few GB's or so). Can I tail the last few lines of such a file w/o reading it from the beginning? I doubt that this won't be possible since I'd guess for gzip, the encoding would depend on all the previous text.

But still I'd like to hear if anyone has tried doing something similar - maybe investigating over a compression algorithm that could provide such a feature.

+9  A: 

No, you can't. The zipping algorithm works on streams and adapts its internal codings to what the stream contains to achieve its high compression ratio.

Without knowing what the contents of the stream are before a certain point, it's impossible to know how to go about de-compressing from that point on.

Any algorithm which allows you to de-compress arbitrary parts of it will require multiple passes over the data to compress it.

Ben S
+1  A: 

Note that using gzcat | tail is still reading the entire file -- gzcat is streaming the entire file to tail, it's just that tail is only keeping the last n lines.

I think he's aware of this, which is why he is asking how to _simulate_ those commands, without actually reading all the contents.
Ben S
+2  A: 

If you have control over what goes into the file in the first place, if it's anything like a ZIP file you could store chunks of predetermined size with filenames in increasing numerical order and then just decompress the last chunk/file.

Jared Updike
This sounds like a good compromise. However, the OP should be aware that this will lower the compression ratio. If testing shows that the ratio change is acceptable, this is a great idea.
Ben S
You can actually achieve this by resetting the compression dictionary part-way through a file, thus removing the need to split the file itself into chunks.
Nick Johnson