tags:

views:

70

answers:

3

When I GZIP an swf file, the size goes from 1,21 mb to 1,86 mb... So, my question is a bit answered by myself. The real question is how this is possible? I guess, as a colleague of mine said, that the swf is already binary and can't be compressed anymore.

Conclusions is also that zipping swf files shouldn't be done.

+5  A: 

SWF is already encoded, and that encoding includes compression. It's perfectly possible that if you intend to compress an already compressed file, compression will result in a bigger file. It happens when you try to ZIP a JPEG or PNG file, for example.

What your colleague said is not true though. There are a lot of binary files that can be compressed. For example, BMP files.

Pablo Santa Cruz
+5  A: 

Have a look at the first three magic bytes of the SWF file. If they're FWS, it's an uncompressed file, if they're CWS, it's already compressed with zLib and can't be compressed further using gZip (which also uses zLib to compress). Although it should only get a bit larger, growing 50% in size is extreme...

Typically, uncompressed SWF files can be compressed a bit, but not that much. The SWF file format is very optimized and typically generates very small and compact files.

By the way, if you use my tool Precomp and a tool that has a better compression than gZip (for example 7-Zip), you can also compress most compressed SWF files a bit further by first running Precomp on the file and then using 7-Zip on the resulting PCF file.

This will also detect and recompress JPG files inside SWF files. This is a completely lossless process, too and also works for some other already compressed filetypes as ZIP, JPG, PNG, GIF.

schnaader
+1  A: 

In information theory there is a concept called entropy that is a sort of a measure of the "true" amount of information in a message (in your example, the message is the SWF file). One of the common units used for this measure is the bit.

A file with 1.21 MB occupies approximately 10,150,215 bits. However its entropy may be less than 10,150,215 bits because there is some order, or predictability, in the data. Let's say you measured that file's entropy and came to the conclusion that the entropy is only 9,000,000 bits. This means that you can't compress it in a lossless manner to a size less that 9,000,000 bits.

But compression algorithms end up adding some more data to the compressed files so that they are able to uncompress it later. Algorithms include some information about the kind of "abbreviations" made when compressing the data. This means that the theoretical limit given by the entropy won't be reached because of that extra algorithm-specific data.

If your file is already compressed that means its size is already close to the entropy of the original data. When you try to compress it again (and specially in your case as you're using the same algorithm), the size reduction won't be much, and you will be adding yet another layer of the algorithm-specific extra data. If the extra data is more than the extra size reduction, your twice compressed file will be larger than the one compressed only once.

Martinho Fernandes