ansaurus

Question

HtmlUnit and XPath: DOMNode.getByXPath only works on HtmlPage?

Answer 1

+2 A:

You've tried to treat an attribute as an element. Try this instead:

String link = ((DomAttr) div.getFirstByXPath("//a/@href")).getValue();

Then I got

Fetching front page
Extracting article links
Found 24 articles
Title: EIF theatre review: Sin Sangre | The Man Who Fed Butterflies | Caledonia | Songs Of Ascension | Vieux Carré | The Gospel At Colonus
Intro: The EIF's theatre programme wasn't as far-reaching as it could have been, but did find an exoticism in the familiar, writes Mark Fisher
Link: /Register.aspx?ReturnURL=http%3a%2f%2fliving.scotsman.com%2fsectionhome.aspx%3fsectionID%3d7063
...

Also, your ArticleInfo class declares "link" to be a String, then assigns it some (custom?) class. I had to mangle things a bit just to get it to compile.

Rodney Gitzel 2010-09-08 16:25:56

`Link` was a container class holding two strings, one representing the clickable words displayed, and the other representing the URL of the linked resource. Sorry, I should have factored it out, but I was a little rushed when I wrote this! I have ammended this in the above code now.

isme 2010-09-08 19:44:19

@Rodney Gitzel: +1 for catching that syntax error (`//a/@href/text()`)

Alejandro 2010-09-08 19:48:55

ansaurus

tags:

views:

answers:

HtmlUnit and XPath: DOMNode.getByXPath only works on HtmlPage?

related questions