tags:

views:

416

answers:

5

I'm using CURL to fetch some data from user accounts. First it logs in and then redirect to another URL where the data resides.

My stats showed that it took an average of 14 seconds to fetch some data spread over 5 pages. I would like to speed things up, my questions are:

Is it possible to see how much each step takes? Do you know how could I speed up/enhance CURL?

Thanks

A: 

You could try multithreading although I'm too sure whether that would completely work or not.

Alias14
I'm using multi-processing but I would like to speed up each CURL task as much as possible because I need to complete DB update in 4 hours intervals.
embedded
A: 

To make the task /feel/ faster -- Don't run it from web, run it as a periodic task (cron job). Cache the file on disk.

J-16 SDiZ
I'm running it as CRON job.what do you mean by cache the file on disk?
embedded
+1  A: 

You can't make the process of retrieving a page from a server any faster.

You can make the pages smaller, so they can download quicker. You can beef up the processing power on the servers or the connection between your server and the server the pages are on.

If you are consuming a service, what format is the data in? If it is XML, maybe it is too verbose and this is causing lots of extra kilobytes, for example.

Sohnee
its in HTML format.Maybe I could get rid of the images and just download the text?is it possible using CURL?
embedded
CURL does NOT download the image.
J-16 SDiZ
Ok, so are there any CURl tweaks I could try out?Is it possible to find the bottleneck step?
embedded
@embedded cURL downloads the contents of a single URL, that's it. It establishes a connection, downloads the contents and closes the connection; the bare technical minimum that needs to be done. It'll take as long as it takes to get the data moved over the network. The overhead of cURL itself is absolutely negligible.
deceze
A: 

split task to 3 files.

  1. file for retrieving page list and as your main script (to put on crontab) (main.php)
  2. for parsing the actual page. (parse.php)
  3. some shell script to process your 2nd script.

Then, in your 1st file, do something like this:

<?php
$pagelist = get_page_list();//this will retrieve page list using CURL and save each page to some, let's say pagelist.txt and return this absolute path.

exec("/path/to/php /your/3rdscript.sh < $pagelist");
?>

And here's your 3rd file:

#!/bin/bash  

while read line
do
    /path/to/php /path/to/your/2ndscript.php -f $line &
done

Please note that on 3rd script (the shell script) I use & (ampersand). This will tell the shell to put that particular process into background process.

On your 2nd script, you can use something like this:

<?php

$pageurl = $argv[2];
//do your curl process to fetch page $pageurl here

Using step above, you can speed up by fetching several pages at once.

silent
Right now I'm parsing each retrieved data from CURL.do you think that by using your suggestion i could speed up performance by noticeable factor?
embedded
+1  A: 

you can use parallelCurl by Pete Warden. The source is available here http://github.com/petewarden/ParallelCurl. The module allows you to run multiple CURL url fetches in parallel in PHP

retornam
thanks I'll take a look and see if it fits my needs.Are you using this?
embedded