views:

451

answers:

4

Hi, I need to pre-compress some very large html/xml/json files (large data dumps) using either gzip or deflate. I never want to serve the files uncompressed. They are so large and repetitive that compression will probably work very very well, and while some older browsers cannot support decompression, my typical customers will not be using them (although it would be nice if I could generate some kind of 'hey you need to upgrade your browser' message)

I auto generate the files and I can easily generate .htaccess files to go along with each file type. Essentially what I want is some always on version of mod_gunzip. Because the files are large, and because I will be repeatedly serving them, I need a method that allows me to compress once, really well, on the command line.

I have found some information on this site and others about how to do this with gzip, but I wondered if someone could step me through how to do this with deflate. Bonus points for a complete answer that includes what my .htaccess file should look like, as well as the command line code I should use (GNU/Linux) to obtain optimal compression. Super bonus points for an answer that also addresses how to send "sorry no file for you" message to un-compliant browsers.

would be lovely if we could create a "precompression" tag to cover questions like this.

-FT

A: 

A quick way to compress content without dealing directly with moz_gzip/mod_defalte is using ob_gzhandler and modifying headers (before any output is send to the browser).

<?php
/* Replace CHANGE_ME with the correct mime type of your large file. 
 i.e: application/json
*/

ob_start ('ob_gzhandler');
header('Content-type: CHANGE_ME; charset: UTF-8');
header('Cache-Control: must-revalidate');
$offset = 60 * 60 * 2 ;
$ExpStr = 'Expires: ' . gmdate('D, d M Y H:i:s',time() + $offset) . ' GMT';
header($ExpStr);

/* Stuff to generate your large files here */
Stolz
This is doing gzip on the fly. I but the file already exists as a html/json/xml/whatever on the disk. I suppose that I could use php like this to generate the right headers then echo the file (or equivalent) but isnt there a way to do that in just apache?
ftrotter
+4  A: 

Edit: Found AddEncoding in mod_mime

This works:

<IfModule mod_mime.c>
 <Files "*.html.gz">
  ForceType text/html
 </Files>
 <Files "*.xml.gz">
  ForceType application/xml
 </Files>
 <Files "*.js.gz">
  ForceType application/javascript
 </Files>
 <Files "*.gz">
  AddEncoding gzip .gz
 </Files>
</IfModule>

The docs make it sound like only the AddEncoding should be needed, but I didn't get that to work.

Also, Lighttpd's mod_compression can compress and cache (the compressed) files.

Zash
If you can include the mime type code I would accept this answer, looks like no one is going to give me a complete answer including the deflate option...
ftrotter
Would that also be a:`<FilesMatch "\.html\.z$"><IfModule mod_headers.c>Header set Content-Encoding: deflate</IfModule></FilesMatch>`?
maxwellb
+2  A: 

For the command line, compile zlib's zpipe: http://www.zlib.net/zpipe.c and then

zpipe < BIGfile.html > BIGfile.htmlz

for example.

Then using Zash's example, set up a filter to change the header. This should provide you with having RAW deflate files, which modern browsers probably support.

For another way to compress files, take a look at using pigz with zlib (-z) or PKWare zip (-K) compression options. Test if these work coming through with Content-Encoding set.

maxwellb
Oh, and change Z_DEFAULT_COMPRESSION in zpipe to Z_BEST_COMPRESSION.
maxwellb
does this do "deflate" compression or just gzip?
ftrotter
zpipe, at least, does deflate.Try and set up a test file for pigz compression, I honestly just don't have the test environment to test this myself right now.Pigz also will compress faster by utilizing multiple cores. Woo.
maxwellb
+2  A: 

If I were you, I would look at inbuilt filesystem compression instead of doing this at the apache layer.

On solaris zfs has transparent compression, use zfs compress to just compress the filesystem. Similarly, windows can compress folders, apache will serve the content oblivious to the fact it's compressed on disk. Linux has filesystems that do transparent compression also.

jskaggz
great comment. What file-system on Linux and any advice on doing this in a cloud instance? How to properly set the headers (so the clients can understand the content?)
ftrotter
It's not as elegant on linux, but there are fuse modules that will do transparent compression/decompression. Like this one: http://miio.net/wordpress/projects/fusecompress/You wouldn't have to do anything with the headers in apache, because as far as apache's concerned, they're normal files. :-)
jskaggz
I don't see how this answer addresses the problem. It sounds as if ftrotter wants to pre-compress the files to save the processing overhead at request time. If using a transparent file system compression, Apache will still have to re-compress at request time.
Jason R. Coombs
I think I must have misread the question. I thought the intent was to save space on the machine, but you're right, after re-reading the question I understand.
jskaggz