views:

49

answers:

2

Hey, I'd like to scrape some data from my blog using YQL:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"

How can I use different bits of xpath in my query? E.g. can I do something like:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"

assuming I want to get the post and the title? I guess I could take in all the HTML but I'd rather only take what I need as speed is an issue here.

Once I have the HTML I want to extract the text from the markup, is it OK to use PHP regular expressions for this?

I also understand you can use CSS syntax, if you have experience using this with YQL and could guide me in how I could write a similar query to the one above but in CSS rather than XPATH I'd be grateful!

Thanks.

A: 

It is not possible. You need to execute this query twice. The first time for the first xpath and the second time for the second xpath. Of course you can write your own open table declaration and provide support for this kind of queries.

Skarab
Ok thanks for the info!
Umar
A: 

Regarding CSS:

See the YQL website itself for this. Search google for YQL and CSS (I can only post one link in here and the 2nd one is more useful.)

The example they have there is actually no longer working but you can try out this example, which scrapes the questions from the frontpage of stackoverflow.

YQL example

Multiple Selects with one XPATH:

You CAN do this directly with xpath syntax. e.g.

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title']|//head/meta[@name='description']|//head/meta[@name='keywords']"
sspier
Thanks, wasn't sure about the syntax but that's cleared it up.
Umar