views:

228

answers:

3

Hi, I have a piece of code that fetches data by giving it an ID. If I give it an ID of 1230 for example, the code fetches an article data with an ID of 1230 from a web site (external) and insert it into a DB.

Now, the problem is that I need to fetch all the articles, lets say from ID 00001 to 99999. If a do a 'for' loop, after 60 seconds the PHP internal time limit stops the loop. If a use some kind of header("Location: code.php?id=00001") or header("Location: code.php?id=".$ID) and increase $ID++ and then redirect to the same page the browser stops me because of the infinite loop or redirection problem.

Please HELP!

A: 

If your server lets you, this is probably the best solution: just remove the time limit for this script.

set_time_limit(0);
Matchu
A: 

Well theres several ways you can do this.

The best way to do this is to set up a cron to execute your scraper every X minutes.

This being sed you will need to keep track of what id your currently at.

so if you set up a function to write to a file you can do the following way

--

Open file (get current id) Start Parser at the id for 60 times Insert the data Open the file and update it with the new id close files and exit.

This will run over space of few hours or however long it takes.

  1. Is if your doing this manually and your sitting there and refreshing everytime the script finishes then you can use sessions instead of writing the id to the file

    `session_start();
    $id = (isset($_SESSION['position']) ? $_SESSION['position'] : 0);
    for($i=$id;$<=9999;$i++)
    {
       //FetchItem($id); //Or whatever function it is you use!
       //Update the id for next run.
       $_SESSION['position'] = $id;
    }`
    
  2. If your your willing to overide your servers resources you can extend the 60 seconds using set_time_limit(120) for 120 seconds or whatever you prefer.

RobertPitt
...meh. Cron is really only the way to go if he has a hugely busy website that would be taken down by running that script, or if he plans to keep collecting this data continuously rather than all in one go. Really, this script should just be run on his computer rather than a remote host, anyway.
Matchu
Totally agree, Scraping takes up too much server resources, IF i was scraping i would set up a cron to run on the hours my sites are least busy!
RobertPitt
I'm running this script in localhost, then I will pu the DB online by importing it, so its not a problem. the problem I have is that I dont know CRON. But I think i will do it by setting set_time_limit(0)..
Jonathan
A: 

If your server won't let you change the script time limit, just have your script check the database for the last inserted article in your sequence and start from there.

Another approach: Use Javascript "window.location = " instead of a header to redirect.

Robert
How do i check the last row and then run the code without redirecting to the same page? Doing a loop will give the time limit error.
Jonathan
Sorry I wasn't very clear. I meant to do the loop and let it run until it times out, then just run it again and let it pick up from where it left off. If that's not practical (because it would take too many loops, or because this process needs to be completely automated) then the javascript I mentioned is a better way to go.
Robert