views:

72

answers:

1

Requirements

  • Written in PHP
  • Control over the code (open source would be awesome, purchasing code is an option too)

Optional features

  1. Listen to robots.txt
  2. Automatic rate limiting
  3. Scrape based on rules into a data object
    • Admin interface, or configurable back end, to setup new rules
    • Something like CSS selectors to pick our data in the rules
    • Periodic / importance to update
  4. Logs errors / alerts an appropriate party when need to update rules
  5. Written with the PHP Symphony framework would be astounding, but I'm not expecting this
  6. MySQL backend
  7. Other things I'm not thinking of that are important to screen scraping in general

I know I won't get everything I want in the optional features - I'm mainly looking for something decently developed rather than re-inventing the wheel.

I've seen pieces like PHP Simple HTML DOM Parser noted in HTML Scraping in Php. I will build a custom solution if needed, so anything that might help even if not a complete solution is appreciated.

A: 

This is not really an answer rather a redirect but there is a really good book for php called

Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL http://amzn.to/aqoKsV

I know it is kind of a cruddy answer but I will say this, I was looking for the same thing you are a few months back and this book is great. trust me you won't be disappointed covers everything you just said you wanted.

-thanks

BrandonS
The book is really everything you are looking for, the author has custom scripts he has created that great and very well documented so easy to modify. should really check it out.
BrandonS