I know we can grab information (with php) from any site and create own.
I'm talking about parsing some additional content like movie information (dates, budget, persons, etc) or video file properties from youtube (size, duration).
I'm excited on realizing of grabbing process from big sites and large amounts of information.
Seems there are several problems:
- Time of script execution. Seems we can make a rotation script to grab all the pages from one to another and push the content to our mysql base, but on a big number of pages execution time will be more than ordinary hosting provides (usually nearly 30 seconds), so the script will die on some moment.
- Amount of memory. Script will eat a much memory during parsing of a big number of pages.
- Antiddos? on located site (much queries from one ip address).
The main idea of this question is how to get round all these stones and make a rotational script (which can work all day long) without errors.
Are there some other bad news we can get during process?
Your thoughts?