views:

108

answers:

3

I am trying to pull a page for parsing information out of it using cfhttp. The page headers that I am calling are:

Content-Encoding: gzip

Connection: Keep-Alive

Content-Length: 19066

Server: IBM_HTTP_Server

Vary: Accept-Encoding, User-Agent

Content-Language: en-US

Cache-Control: no-cache="set-cookie,

set-cookie2"

Content-Type:

text/html;charset=ISO-8859-1

I set the charset to ISO-8859-1 however I am getting the following in the FileContent (only a small sample is shown below but I think it gets to point across).

EðÑq·Oã?·Ì\ZóL¯þ´Vú5ðbä£ÿæ¾_HÉÒñQãO\Çþãë85ÁÜ à±°ùÖ}&bßý?,u?2SùQyk5g?UÛ3Ѹfã×ARÃi_iûRã _ òCA¿-ß."b /¯ßíWÝÆ´}w~,°iøÜCáÇþ@ÃZ5¤ïsÁ8½°ì* ZÜéjOÝK/Ë4§ÈG5×ä*¬6ÚwÇ0]ã:àÑþé¬G"ÅÁl/t° jlá»5¶&¯lìYìºØ'yDð½|#ý<ñìTé%¾ï¬ùƪx¶}«±o9»ë¼ÂÆÒï'w8Y?÷ðxsllû 6íqüGÞsÜóÀx·ªk®XºàåZ{íÁ½åo÷mbq¥ÝÃ8M

I tried other charsets and was considering the gzip encoding to be causing the problem but I am unsure how the test if that is the issue. Any suggestions or help would be greatly valued.

Below is my Code

<cfhttp 
    METHOD="get"
    throwonerror="yes" 
    CHARSET="ISO-8859-1"
    URL="http://www.cars.com/for-sale/searchresults.action?sf1Dir=DESC&amp;prMn=1&amp;crSrtFlds=stkTypId-feedSegId-pseudoPrice&amp;rd=100000&amp;zc=44203&amp;PMmt=0-0-0&amp;stkTypId=28881&amp;sf2Dir=ASC&amp;sf1Nm=price&amp;sf2Nm=miles&amp;feedSegId=28705&amp;searchSource=UTILITY&amp;pgId=2102&amp;rpp=10"&gt;

    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
    <cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >
</cfhttp>

<cfset listings = #cfhttp.FileContent#>
<cfoutput>
    #listings#
</cfoutput>

I have also tried the headers:

    <cfhttpparam type="Header" name="Accept-Encoding" value="*">
    <cfhttpparam type= "Header" name= "TE" value= "deflate;q=0" >

And tried removing the 'Accept-Encoding' header and just leaving the TE.

UPDATE: I still havn't figured it out, but I found something that might help someone help me out. When I used a test php server of mine to run file_get_contents on the same page and it worked fine, then if I ran the same cfhttp code to call the php page that was calling the page I need it worked just fine. Thanks for the suggestions so far.

A: 

The first thing I would do is make sure that it's not the source content/server that's the problem by trying your same code against other pages. If they work fine, then it's likely the server/content that you're trying to consume. If they have the same problem, then the issue is in your code. It would also be helpful if you posted your code.

Adam Tuttle
Thanks, it does seem to be an issue with the content I am trying to consume, however not on the server as I am able to us cfhttp on any css or js files but it just seems to be the html pages.
Patcouch22
+3  A: 

Per the header what you are seeing is the gzipped contents of the file. It will need to be uncompressed before it is useful to you. I assume you can do this with cfzip but have not had any experience doing it.

This post seems to indicate that you can add a header in your request to have it unzipped/deflated before being returned:

<cfhttp ...>
    <cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0">
    <cfhttpparam type="Header" name="TE" value="deflate;q=0">
</cfhttp>
Daniel Sellers
This doesn't seem to be the issue, this was my first thought, but if I was having this issue I would be getting a connection failed issue. This seems to almost be more a charset issue, but at the same time no charset seems to be working.
Patcouch22