views:

2183

answers:

8

Is there a PHP class/library that would allow me to query an XHTML document with CSS selectors? I need to scrape some pages for data that is very easily accessible if I could somehow use CSS selectors (jQuery has spoiled me!). Any ideas?

+7  A: 

After googling further (initial results weren't very helpful) it seems there is actually a Zend Framework libary for this, along with some others:

Wilco
+1 phpQuery is absolutely wonderful.
Jonathan Sampson
+4  A: 

XPath is a fairly standard way to access XML (and XHTML) nodes, and provides much more precision than CSS.

nickf
+1 to bring to 0, but mainly because alternatives are always good.
eyelidlessness
wow, I was downvoted for this? I'm kinda interested as to why...
nickf
Wasn't me the OP! :-) I actually think this would be the best alternative since XHTML is just a subset of XML.
Wilco
Sometimes people here are rather random. I agreed on XPath being a better tool to use, if it's available. It's standard, more powerful and quite similar to CSS-selectors anyway.
troelskn
NickF, there's a nothing more "precise" about XPath... http://ejohn.org/blog/xpath-css-selectors/ There is one more option for selection, which is nice, but the CSS selectors are a lot cleaner, and understood by a wider audience.
altCognito
See also: http://plasmasturm.org/log/444/
altCognito
In CSS you couldn't do anything like "select the parent of a 'strong' tag"
nickf
A: 

For document parsing I use DOM. This can quite easily solve your problem if you know the tag name (in this example "div"):

 $doc = new DOMDocument();
 $doc->loadHTML($html);

 $elements = $doc->getElementsByTagName("div");
 foreach ($elements as $e){
  if ($e->getAttribute("class")!="someclass") continue;

  //its a div.classname
 }

Not sure if DOM lets you get all elements of a document at once... you might have to do a tree traversal.

+1  A: 

For jQuery users most interesting may be port of jQuery to PHP, which is phpQuery. Almost all sections of the library are ported. Additionally it contains WebBrowser plugin, which can be used for Web Scraping whole site's path/processes (eg accessing data available after logging in). It simply simulates web browser on the server (events and cookies too). Latest versions has experimental support for XML namespaces and CSS3 "|" selector.

Tobiasz Cudnik
+1  A: 

I wrote mine, based on Mootools CSS selector engine http://selectors.svn.exyks.org/. it rely on simplexml extension ability (so, it's read-only)

131
A: 

Another one:
http://querypath.org/

mario
A: 

A great one is a component of symfony 2, CssSelector\Parser. It converts CSS selectors into XPath expressions. Take a look =)

Clement Herreman