views:

53

answers:

1

I'm using PHP's cURL function to read profiles from steampowered.com. The data retrieved is XML, and only the first roughly 1000 bytes are needed.

The method I'm using is to add a Range header, which I read on a Stack Overflow answer (curl: How to limit size of GET?). Another method I tried was using the curlopt_range but that didn't work either.

<?
$curl_url = 'http://steamcommunity.com/id/edgen?xml=1';
$curl_handle = curl_init($curl_url);

curl_setopt ($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt ($curl_handle, CURLOPT_HTTPHEADER, array("Range: bytes=0-1000"));

$data_string = curl_exec($curl_handle);

echo $data_string;

curl_close($curl_handle);
?>

When this code is executed, it returns the whole thing.

I'm using PHP Version 5.2.14.

+2  A: 

The server does not honor the Range header. The best you can do is to cancel the connection as soon as you receive more data than you want. Example:

<?php
$curl_url = 'http://steamcommunity.com/id/edgen?xml=1';
$curl_handle = curl_init($curl_url);

$data_string = "";
function write_function($handle, $data) {
    global $data_string;
    $data_string .= $data;
    if (strlen($data_string) > 1000) {
        return 0;
    }
    else
        return strlen($data);
}

curl_setopt ($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt ($curl_handle, CURLOPT_WRITEFUNCTION, 'write_function');

curl_exec($curl_handle);

echo $data_string;

Perhaps more cleanly, you could use the http wrapper (this would also use curl if it was compiled with --with-curlwrappers). Basically you would call fread in a loop and then fclose on the stream when you got more data than you wanted. You could also use a transport stream (open the stream with fsockopen, instead of fopen and send the headers manually) if allow_url_fopen is disabled.

Artefacto
This did the trick! Although, I don't completely understand the mechanics of the CURLOPT_WRITEFUNCTION. Can you explain what's going on there? Thanks again.
Curtis
@Cur It's a callback that's called by the curl extension every time new data is received. The callback receives the curl handler and the data that was just read. It should return the number of bytes read, if it doesn't, it aborts the transfer (though this last part is not documented, it seems to be the behavior).
Artefacto
@Cur OK I found the docs here: "Return the number of bytes actually taken care of. If that amount differs from the amount passed to your function, it'll signal an error to the library. This will abort the transfer and return CURLE_WRITE_ERROR." http://curl.haxx.se/libcurl/c/curl_easy_setopt.html
Artefacto
Artefacto
@Artefact Thank you that is extremely helpful! I would vote up your comments and your answer but I don't have enough rep yet :(
Curtis