tags:

views:

24

answers:

2

I'm using fsockopen on a small cronjob to read and parse feeds on different servers. For the most past, this works very well. Yet on some servers, I get very weird lines in the response, like this:

<language>en</language>
 <sy:updatePeriod>hourly</sy:updatePeriod>
 <sy:updateFrequency>1</sy:updateFrequency>

11
 <item>
  <title>
1f
July 8th, 2010</title>
  <link>
32
http://darkencomic.com/?p=2406&lt;/link&gt;
  <comments>
3e

But when I open the feed in e.g. notepad++, it works just fine, showing:

<language>en</language>
 <sy:updatePeriod>hourly</sy:updatePeriod>
 <sy:updateFrequency>1</sy:updateFrequency>
   <item>
  <title>July 8th, 2010</title>
  <link>http://darkencomic.com/?p=2406&lt;/link&gt;
  <comments>

...just to show an excerpt. So, am I doing anything wrong here or is this beyond my control? I'm grateful for any idea to fix this. Here's part of the code I'm using to retrieve the feeds:

$fp = @fsockopen($url["host"], 80, $errno, $errstr, 5);
  if (!$fp) {
   throw new UrlException("($errno) $errstr ~~~ on opening ".$url["host"]."");
  } else {
   $out = "GET ".$path." HTTP/1.1\r\n"
     ."Host: ".$url["host"]."\r\n"
     ."Connection: Close\r\n\r\n";
   fwrite($fp, $out);
   $contents = '';
   while (!feof($fp)) {
    $contents .= stream_get_contents($fp,128);
   }
   fclose($fp);
A: 

I don't see anything strange that could cause that kind of behaviour. Is there any way you can use cURL to do this for you? It might solve the problem altogether :)

Dennis Haarbrink
+2  A: 

This looks like HTTP Chunked transfer encoding -- which is a way HTTP has of segmenting a response into several small parts ; quoting :

Each non-empty chunk starts with the number of octets of the data it embeds (size written in hexadecimal) followed by a CRLF (carriage return and line feed), and the data itself.
The chunk is then closed with a CRLF.
In some implementations, white space characters (0x20) are padded between chunk-size and the CRLF.


When working with fsockopen and the like, you have to deal with the HTTP Protocol yourself... Which is not always as easy as one might think ;-)

A solution to avoid having to deal with such stuff would be to use something like curl : it already knows the HTTP Protocol -- which means you won't have to re-invent the whell ;-)

Pascal MARTIN
Heh, but I like to reinvent the wheel! ;)Thanks for the info, that helped a lot!
Ineluki
Well, re-inventing the wheel can be fun -- if you have a lot of time ;-) ;; you're welcome :-)
Pascal MARTIN