tags:

views:

240

answers:

2

I have tried a few things to enable gzip compression using PHP Simple HTML DOM Parser but nothing has seemed to work thus far. Using ini_set I've manged to change the user agent, so I figured it might be possible to also enable gzip compression?

include("simpdom/simple_html_dom.php");
ini_set('zlib.output_compression', 'On');   
$url = 'http://www.whatsmyip.org/http_compression/';
$html = file_get_html($url);
print $html;

The website above tests it. Please let me know if I am going about this the wrong way completely.

====

UPDATE

For anyone else trying to achieve the same thing, it's best to just use cURL, then use the dom parser like so:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");     
curl_setopt($ch, CURLOPT_TIMEOUT,5); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects

$return = curl_exec($ch); 
$info = curl_getinfo($ch); 
curl_close($ch); 

$html = str_get_html("$return");
A: 

Just add the following line at the very top of the PHP script that outputs the data:

  ob_start("ob_gzhandler");

Reference

-------Update--------

You can also try to enable gzip Compresion sitewide via a .htaccess file. Something like This should gzip your sites content but images:

# Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
# BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won't work. You can use the following
# workaround to get the desired effect:
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html

# Don't compress images
#SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
Pablo
Thanks for the response, however, according to that compression test page. . .it says it's still not working.The only way I can get compression to work is to use cURL. curl_setopt($ch, CURLOPT_ENCODING , "gzip"); Any other ideas?
brant
Pablo - great code :) Keep in mind, though, he's "requesting" gzip content, not sending it in this case. He's going to another server, asking for data and trying to say "give it to me compressed, i can handle it".
Dan Heberden
+1  A: 

CURLOPT_ENCODING is so that the response comes back (accepted as) gzipped data - the server settings (ob_start("ob_gzhandler") or php_ini..) tell the server to OUTPUT gzipped data.

Just like if you went to that page with a browser that didn't support gzip. To accept gzip data, you have to use curl so you can make that distinction.

Dan Heberden
Thanks for the clarification Dan. I tested your method with file_get_html and it still didn't work. Seems like there is no shortcut and curl has to be used first.
brant
Well, that really is for file_get_contents but thought it'd be worth a shot..
Dan Heberden