views:

358

answers:

3

I recently looked into the possibility of making multiple requests with curl. I may not be understanding it fully, so I am just hoping to clarify some concepts.

It's definitely a good option if you are fetching content from multiple sources. That way, you can start processing the results from faster servers while still waiting for slower ones. Does it still make sense to use it if you are requesting multiple pages from the same server? Would the server still serve multiple pages at the time to the same client?

A: 

think most or all servers will serve more than one page at a time to the same client. You could set a reasonable timeout for you connections, then if one fails to connect, push it onto your connection array to be retried after all the others have been gone through. That way you'll be getting at least one at a time, even though it will always be trying to get several. Does that make sense? :)

GZipp
A: 

Some servers might be configured to behave defensively if too many connections or requests are made from what it believes is the same client. It might do things such as drop/reject connections, limit bandwidth to some aggregate total between all your connections, or other things.

Regardless, be considerate like you would want a web crawler to be consider to your site, and try not to bombard a single server with too much at once.

If you need to fetch 5 pages each, from 5 different servers, you're much more likely to finish faster if you use 1 connection to each server until done, than if you did 5 connections to 1 server until done.

chris
+1  A: 

You can't do multi-threading in PHP, so you won't be able to start processing one page while the others are still being retrieve. Multi-curl won't return control until all pages are retrieved or timeout. So it will take as long the it takes for the slowest page to be retrieved. You are going from serial (curl) to parallel (multi_curl), which will still give you a big boost.

Servers will serve multiple pages to the same client up to a certain configure limit. Requesting 5-10 pages from a server would be fine.

Brent Baisley