views:

116

answers:

3

Hi,

I know scrapy.org that is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. I used it in some projects and it is very simple to use. But it is written in python.

My question is, are there simlar frameworks for php?

+3  A: 

Snoopy

Galen
thanks, but can not find examples, is it possible to extract data from specific html elements? E.g. give it a pattern to extract all data located in html/body/div/
ArtWorkAD
A: 

You may find SimpleHTMLDom http://simplehtmldom.sourceforge.net/ with cURL is enough for your needs.

D Roddis
Suggested third party alternatives to [SimpleHtmlDom](http://simplehtmldom.sourceforge.net/) that actually use [DOM](http://php.net/manual/en/book.dom.php) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org).
Gordon
don't use simpleHtmlDom. it is slow and offers less functionality than the built in dom libs.
Byron Whitlock
+1  A: 

You may find ScraperWiki interesting. It has tutorials on various scraping topics and an online generator for scraper scripts in PHP, Python and other languages.

Gordon