views:

680

answers:

4

I read some URL with fsockopen() and fread(), and i get this kind of data:

      <li
10 
></li>
      <li
9f 
>asd</li>

d  
          <li
92

Which is totally messed up O_O

--

While using file _ get _ contents() function i get this kind of data:

<li></li>
      <li>asd</li>

Which is correct! So, what the HELL is wrong? i tried on my windows server and linux server, both behaves same. And they dont even have the same PHP version.

--

My PHP code is:

$fp = @fsockopen($hostname, 80, $errno, $errstr, 30);
if(!$fp){
 return false;
}else{
 $out = "GET /$path HTTP/1.1\r\n";
 $out .= "Host: $hostname\r\n";
 $out .= "Accept-language: en\r\n";
 $out .= "Connection: Close\r\n\r\n";
 fwrite($fp, $out);

 $data = "";
 while(!feof($fp)){
  $data .= fread($fp, 1024);
 }
 fclose($fp);

Any help/tips is appreciated, been wondering this whole day now :/

Oh, and i cant use fopen() or file _ get _ contents() because the server where my script runs doesnt have fopen wrappers enabled > __ <

I really want to know how to fix this, just for curiousity. and i dont think i can use any extra libraries on this server anyways.

+1  A: 

You probably want to use cURL.

<?php
// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

// grab URL and pass it to the browser
$output = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);
?>
Chacha102
any other ideas?
Ooops ... forgot to add the part that gave you it in a variable. But not exactly sure what is happening with fread. Most people use cURL for files outside of the server.
Chacha102
thanks for your help, im using curl now, as well as the other implementations if curl functions are not found.
A: 

Hi,

About your "strange data" problem, this might be because the server you are requesting data from is transferring it in chunked mode.

You can take a look at the HTTP headers, when calling the same URL in your browser ; one of those headers might be like this :

Transfer-encoding: chunked


Quoting wikipedia's article on that matter :

Each non-empty chunk starts with the number of octets of the data it embeds (size written in hexadecimal) followed by a CRLF (carriage return and line feed), and the data itself. The chunk is then closed with a CRLF. In some implementations, white space characters (0x20) are padded between chunk-size and the CRLF.

The last chunk is a single line, simply made of the chunk-size (0), some optional padding white spaces and the terminating CRLF. It is not followed by any data, but optional trailers can be sent using the same syntax as the message headers.

The message is finally closed by a final CRLF combination.

This looks close to what you are getting... So I'm guessing this is the problem.


As far as I remember, curl knows how to deal with that -- so, the easy way would be to use curl instead of fsockopen and the like

And using curl is often a better idea that using sockets : it will deal with many problems you might encounter ; like this one ;-)


Anoter idea, if you don't have curl enabled on your server, would be to use some already existing library based on fsockopen -- hoping it would take care of those kind of things for you already.

For instance, I've worked with Snoopy a couple of times ; maybe it ealready knows how to deal with that ?
(Not sure : you'll have to test by yourself -- or take a look at the documentation to know if this is OK)
Still, if you want to deal with the mysteries of the HTTP protocol by yourself... Well, I wish you luck !

Pascal MARTIN
thanks! now it makes sense, even though i got the curl on my server, i will make implementation for this fsockopen() problem as well, or my code might not work properly in every place. and im not really interested in including some huge library in my code or anything, since this seems to be simple thing to handle after all.
OK! Have fun, so ;-)
Pascal MARTIN
A: 

With fsockopen(), you get the raw TCP data, not the HTTP contents. I assume you also see the HTTP headers, right? If it's in chunked encoding, you will get all the chunk headers.

This is a known issue. Someone posted a solution here on how to remove chunk headers.

ZZ Coder
A: 

How can i get the content-length of a web page using fscokopen.

Sharefiles

james