I'm building a spider which will traverse various sites and data mining them.
Since I need to get each page separately this could take a VERY long time (maybe 100 pages). I've already set the set_time_limit to be 2 minutes per page but it seems like apache will kill the script after 5 minutes no matter.
This isn't usually a problem since this will run from cron or something similar which does not have this time limit. However I would also like the admins to be able to start a fetch manually via a HTTP-interface.
It is not important that apache is kept alive for the full duration, I'm, going to use AJAX to trigger a fetch and check back once in a while with AJAX.
My problem is how to start the fetch from within a PHP-script without the fetch being terminated when the script calling it dies.
Maybe I could use system('script.php &') but I'm not sure it will do the trick. Any other ideas?