views:

50

answers:

2

I'm wondering about why I need to cut off the last 4 Characters, after using gzcompress().

Here is my code:

header("Content-Encoding: gzip");
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
$index = $smarty->fetch("design/templates/main.htm") ."\n<!-- Compressed by gzip -->";
$this->content_size = strlen($index);
$this->content_crc = crc32($index);
$index = gzcompress($index, 9);
$index = substr($index, 0, strlen($index) - 4); // Why cut off ??
echo $index;
echo pack('V', $this->content_crc) . pack('V', $this->content_size);

When I don't cut of the last 4 chars, the source ends like:

[...]
<!-- Compressed by gzip -->N

When I cut them off it reads:

[...]
<!-- Compressed by gzip -->

I could see the additional N only in Chromes Code inspector (not in Firefox and not in IEs source). But there seams to be four additional characters at the end of the code.

Can anyone explain me, why I need to cut off 4 chars?

+2  A: 

gzcompress produces output described here RFC1950 , the last 4 bytes you're chopping off is the adler32 checksum. This is the "deflate" encoding, so you should just set "Content-Encoding: deflate" and not manipulate anything.

If you want to use gzip, use gzencode() , which uses the gzip format.

nos
Can you figure out what's the last echo? Why is he removing one checksum to add another (albeit a different one)? Why does he than add the length? Is this any other format?
Artefacto
Seems he's trying to emulate gzip by echoing a gzip header and appending the crc32 and length(as per gzip spec).
nos
Good point. The "N" seams to come from the last line. Not the gzcompress(). If I comment out the last line and the 4-char-cut-off-line, there is no additional output.About why theese lines are there I don't know. The code is from a person working on the same project before me, so I'm trying to figure out as well, why theese lines are there.So you mean this does produce a deflate encoding? I guess I'll be better of with gzencode() then.
JochenJung
See "gzdecode" here: http://phpxref.com/xref/erfurtwiki/plugins/lib/upgrade.php.source.html - The gzip header just contains a few flags and says that the following chunk is deflate compressed. Deflate itself would contain a size length of the compressed data stream, and a adler32 crc sum of the compressed data. But gzip chops that off, and adds a crc32 of the decompressed content, plus the size of the uncompressed data.
mario
+2  A: 

gzcompress implements the ZLIB compressed data format that has the following structure:

     0   1
   +---+---+
   |CMF|FLG|   (more-->)
   +---+---+

(if FLG.FDICT set)

     0   1   2   3
   +---+---+---+---+
   |     DICTID    |   (more-->)
   +---+---+---+---+

   +=====================+---+---+---+---+
   |...compressed data...|    ADLER32    |
   +=====================+---+---+---+---+

Here you see that the last four bytes is a Adler-32 checksum.

In contrast to that, the GZIP file format is a list of of so called members with the following structure:

   +---+---+---+---+---+---+---+---+---+---+
   |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
   +---+---+---+---+---+---+---+---+---+---+

(if FLG.FEXTRA set)

   +---+---+=================================+
   | XLEN  |...XLEN bytes of "extra field"...| (more-->)
   +---+---+=================================+

(if FLG.FNAME set)

   +=========================================+
   |...original file name, zero-terminated...| (more-->)
   +=========================================+

(if FLG.FCOMMENT set)

   +===================================+
   |...file comment, zero-terminated...| (more-->)
   +===================================+

(if FLG.FHCRC set)

   +---+---+
   | CRC16 |
   +---+---+

   +=======================+
   |...compressed blocks...| (more-->)
   +=======================+

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   |     CRC32     |     ISIZE     |
   +---+---+---+---+---+---+---+---+

As you can see, GZIP uses a CRC-32 checksum for the integrity check.

So to analyze your code:

  • echo "\x1f\x8b\x08\x00\x00\x00\x00\x00"; – puts out the following header fields:
    • 0x1f 0x8b – ID1 and ID2, identifiers to identify the data format (these are fixed values)
    • 0x08 – CM, compression method that is used; 8 denotes the use of the DEFLATE data compression format (RFC 1951)
    • 0x00 – FLG, flags
    • 0x00000000 – MTIME, modification time
    • the fields XFL (extra flags) and OS (operation system) are set by the DEFLATE data compression format
  • echo $index; – puts out compressed data according to the DEFLATE data compression format
  • echo pack('V', $this->content_crc) . pack('V', $this->content_size); – puts out the CRC-32 checksum and the size of the uncompressed input data in binary
Gumbo
Akk. For most detailed information about the gzip format.
JochenJung