ansaurus

Question

Retrieve partial web page

Answer 1

A:

It will download the whole page with the fopen call, but then it will only read 6kb from that page.

From the PHP manual:

Reading stops as soon as one of the following conditions is met:

length bytes have been read

James Skidmore 2009-10-08 16:35:34

Answer 2

+3 A:

This is more an HTTP that a CURL question in fact.

As you guessed, the whole page is going to be downloaded if you use fopen. No matter then if you seek at offset 5000 or not.

The best way to achieve what you want would be to use a partial HTTP GET request, as stated in HTML RFC (http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html):

The semantics of the GET method change to a "partial GET" if the request message includes a Range header field. A partial GET requests that only part of the entity be transferred, as described in section 14.35. The partial GET method is intended to reduce unnecessary network usage by allowing partially-retrieved entities to be completed without transferring data already held by the client.

The details of partial GET requests using Ranges is described here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.2

Aurélien Vallée 2009-10-08 16:37:08

Answer 3

+1 A:

try a HTTP RANGE request:

GET /largefile.html HTTP/1.1
Range: bytes=0-6000

if the server supports range requests, it will return a 206 Partial Content response code with a Content-Range header and your requested range of bytes (if it doesn't, it will return 200 and the whole file). see http://benramsey.com/archives/206-partial-content-and-range-requests/ for a nice explanation of range requests.

see also Resumable downloads when using PHP to send the file?.

ax 2009-10-08 16:48:31

Answer 4

+1 A:

You may be able to also accomplish what you're looking for using CURL as well.

If you look at the documentation for CURLOPT_WRITEFUNCTION you can register a callback that is called whenever data is available for reading from CURL. You could then count the bytes received, and when you've received over 6,000 bytes you can return 0 to abort the rest of the transfer.

The libcurl documentation describes the callback a bit more:

This function gets called by libcurl as soon as there is data received that needs to be saved. Return the number of bytes actually taken care of. If that amount differs from the amount passed to your function, it'll signal an error to the library and it will abort the transfer and return CURLE_WRITE_ERROR.

The callback function will be passed as much data as possible in all invokes, but you cannot possibly make any assumptions. It may be one byte, it may be thousands.

Keith Palmer 2009-10-08 17:43:37

I've marked this as the accepted answer as it is more robust than the HTTP Range request which may not always be supported, and I can only mark one answer.

James 2009-10-09 16:29:12

ansaurus

tags:

views:

answers:

Retrieve partial web page

related questions