tags:

views:

84

answers:

2

I want to extract "Date: 2009-09-25, 1:54PM EDT" from this webpage

http://auburn.craigslist.org/sha/1392067187.html

But I don't understand how to write Xpath expressions for that.

Can anyone help me in that.

I am getting other fields also from this page.

+2  A: 

Why don't you just run a regexp like the one below?

'Date:\s+([0-9]{4}-[0-9]{2}-[0-9]{2}.+?\<)'

It seams to be the easiest way. And if you don't want to use pure text you can use XPath 2.0 which has support for regexps (fn:matches).

Piotr Czapla
+1  A: 

Are you running the HTML through TIDY or some other process to turn it into XHTML? Or how are you able to execute XPATH against that HTML?

If the document was well-formed, then you could probably use the following XPATH:

/html/body/hr[1]/following-sibling::text()[1]

It finds the first HR element in the document, then selects the first text() node following it(which contains the string "Date: 2009-09-25, 1:54PM EDT"

Mads Hansen
Thanks a lot. You have solved my problem.
Yatendra Goel