ansaurus

Question

Answer 1

A:

You could try multithreading although I'm too sure whether that would completely work or not.

Alias14 2010-04-12 06:46:10

I'm using multi-processing but I would like to speed up each CURL task as much as possible because I need to complete DB update in 4 hours intervals.

embedded 2010-04-12 06:50:50

Answer 2

A:

To make the task /feel/ faster -- Don't run it from web, run it as a periodic task (cron job). Cache the file on disk.

J-16 SDiZ 2010-04-12 06:56:18

I'm running it as CRON job.what do you mean by cache the file on disk?

embedded 2010-04-12 07:11:25

Answer 3

+1 A:

You can't make the process of retrieving a page from a server any faster.

You can make the pages smaller, so they can download quicker. You can beef up the processing power on the servers or the connection between your server and the server the pages are on.

If you are consuming a service, what format is the data in? If it is XML, maybe it is too verbose and this is causing lots of extra kilobytes, for example.

Sohnee 2010-04-12 07:18:51

its in HTML format.Maybe I could get rid of the images and just download the text?is it possible using CURL?

embedded 2010-04-12 07:25:13

CURL does NOT download the image.

J-16 SDiZ 2010-04-12 07:27:00

Ok, so are there any CURl tweaks I could try out?Is it possible to find the bottleneck step?

embedded 2010-04-12 07:31:13

@embedded cURL downloads the contents of a single URL, that's it. It establishes a connection, downloads the contents and closes the connection; the bare technical minimum that needs to be done. It'll take as long as it takes to get the data moved over the network. The overhead of cURL itself is absolutely negligible.

deceze 2010-04-12 08:35:59

Answer 4

A:

split task to 3 files.

file for retrieving page list and as your main script (to put on crontab) (main.php)
for parsing the actual page. (parse.php)
some shell script to process your 2nd script.

Then, in your 1st file, do something like this:

<?php
$pagelist = get_page_list();//this will retrieve page list using CURL and save each page to some, let's say pagelist.txt and return this absolute path.

exec("/path/to/php /your/3rdscript.sh < $pagelist");
?>

And here's your 3rd file:

#!/bin/bash  

while read line
do
    /path/to/php /path/to/your/2ndscript.php -f $line &
done

Please note that on 3rd script (the shell script) I use & (ampersand). This will tell the shell to put that particular process into background process.

On your 2nd script, you can use something like this:

<?php

$pageurl = $argv[2];
//do your curl process to fetch page $pageurl here

Using step above, you can speed up by fetching several pages at once.

silent 2010-04-12 08:30:49

Right now I'm parsing each retrieved data from CURL.do you think that by using your suggestion i could speed up performance by noticeable factor?

embedded 2010-04-12 11:35:12

Answer 5

+1 A:

you can use parallelCurl by Pete Warden. The source is available here http://github.com/petewarden/ParallelCurl. The module allows you to run multiple CURL url fetches in parallel in PHP

retornam 2010-04-14 05:17:07

thanks I'll take a look and see if it fits my needs.Are you using this?

embedded 2010-04-14 10:43:57

ansaurus

tags:

views:

answers:

How can I speed up CURL tasks?

related questions