ansaurus

Question

How to tell if a file is gzip compressed?

Answer 1

+10 A:

The magic number for gzip compressed files is 1f 8b. Although testing for this is not 100% reliable, it is highly unlikely that "ordinary text files" start with those two bytes—in UTF-8 it's not even legal.

Usually gzip compressed files sport the suffix .gz though. Even gzip(1) itself won't unpack files without it unless you --force it to. You could conceivably use that, but you'd still have to deal with a possible IOError (which you have to in any case).

One problem with your approach is, that gzip.GzipFile() will not throw an exception if you feed it an uncompressed file. Only a later read() will. This means, that you would probably have to implement some of your program logic twice. Ugly.

hop 2010-09-13 18:30:11

gzip compressed files often have the .gz file extension (in fact, I don't think I've ever seen a .gzip extension), but it's generally unsafe to rely on file extension to test for the type of file anyhow.

CanSpice 2010-09-13 18:51:05

@CanSpice: of course, typo

hop 2010-09-13 18:52:03

Does it? - The gzip C library will transparently read uncompressed files. Although it will write files uncompressed it puts CRC codes through them to allow "gzip -t" (caught me out once)

Martin Beckett 2010-09-13 18:53:46

@Martin: it does: $ gunzip foogzip: foo: unknown suffix -- ignored

hop 2010-09-13 19:03:08

The c 'library' gzip, ie gzopen/gzread/etc will transparently read uncompressed files. They have an open compression=none mode which does NOT write unchanged flat files.

Martin Beckett 2010-09-13 20:15:59

ansaurus

tags:

views:

answers:

How to tell if a file is gzip compressed?

related questions