ansaurus

Question

get the filesize of very large .gz file on a 64bit platform

Answer 1

+1 A:

I haven't tried this with a file of the size you mentioned, but I often find the uncompressed size of a .gz file with

zcat file.gz | wc -c

when I don't want to leave the uncompressed file lying around, or bother to compress it again.

Obviously, the data is uncompressed, but is then piped to wc.

It's worth a try, anyway.

EDIT: When I tried creating a 5G file with data from /dev/random it produced a file 5G of size 5120000000, although my file manager reported this as 4.8G

Then I compressed it with gzip 5G, the results 5G.gz was the same size (not much compression of random data).

Then zcat 5G.gz | wc -c reported the same size as the original file: 5120000000 bytes. So my suggestion seemed to have worked for this trial, anyway.

Thanks for waiting

pavium 2009-12-27 09:24:35

Yes thanks,but my question was more in the sense of.How do I get the uncompressed filesize without actually doing a decompression.For files smaller than 32bit files. You can just extract the last 4 bytes. This is not possible for larger files, and as you have done , the only way is to do a decompression.

monkeyking 2009-12-28 07:52:56

But my method performed a decompression which didn't affect the original compressed file, and didn't create an extra uncompressed file. There would be no cleaning up afterward. And I think it's worth noting that the answer you accepted said that decompression was the *only* way to get the exact size. It makes sense that *the only way to find out what's in the box, is to open it*.

pavium 2009-12-28 08:36:22

Yes, it didn't affect the original file, but my concern was not "not touching" the file, but merely a speed issue. If I want to allocate an array for the entire data, then I should know the size. That requires doing a decompression, followed by another decompression for the actual datacopy. This is not necessary if the file is smaller than 2.1 gig.std gunzip can also decompress to stdout, doing gunzip -c file |wc -cBut thanks for your input :)

monkeyking 2009-12-28 15:24:38

Answer 2

+6 A:

There isn't one.

The only way to get the exact size of a compressed stream is to actually go and decompress it (even if you write everything to /dev/null and just count the bytes).

Its worth noting that ISIZE is defined as

ISIZE (Input SIZE)
This contains the size of the original (uncompressed) input
data modulo 2^32.

in the gzip RFC so it isn't actually breaking at the 32-bit barrier, what you're seeing is expected behavior.

Kevin Montrose 2009-12-27 09:26:49

ansaurus

tags:

views:

answers:

get the filesize of very large .gz file on a 64bit platform

related questions