views:

30

answers:

2

I'm having issues importing htmlunit (htmlunit.sf.net) into a groovy script.

I'm currently just using the example script that was on the web and it gives me unable to resolve class com.gargoylesoftware.htmlunit.WebClient

The script is:

import com.gargoylesoftware.htmlunit.WebClient

client = new WebClient()
html = client.getPage('http://www.msnbc.msn.com/')
println page.anchors.collect{ it.hrefAttribute }.sort().unique().join('\n')

I downloaded the source from the website and placed the com folder (and all its contents) where my script was located.

Does anyone know what issue I'm encountering? I'm not quite sure why it won't import it

A: 

you just need to download zip file, extract the jar file(s) and place them on the class path when compiling... You dont need the source

http://sourceforge.net/projects/htmlunit/files/htmlunit/2.8/htmlunit-2.8.zip/download

Aaron Saunders
Thanks for the reply Aaron, I ended up trying the 'Grape' route.
StartingGroovy
+1  A: 

You could use Grape to get the dependecy for you during script runtime. Easiest way to do it is to add a @Grab annotation to your import statement.

Like this:

@Grab('net.sourceforge.htmlunit:htmlunit:2.7')
import com.gargoylesoftware.htmlunit.WebClient

client = new WebClient()

// Added as HtmlUnit had problems with the JavaScript
client.javaScriptEnabled = false
html = client.getPage('http://www.msnbc.msn.com/')
println page.anchors.collect{ it.hrefAttribute }.sort().unique().join('\n')

There's only one problem. The page seems to be a little bit to much to chew off for HtmlUnit. When I ran the code I got OutOfMemoryException every time. I'd suggest downloading the html the normal way instead and then using something like NekoHtml or TagSoup to parse the html into XML and work with it that way.

This example uses TagSoup to work with html as xml in Groovy: http://blog.foosion.org/2008/06/09/parse-html-the-groovy-way/

xlson
I went ahead and gave this a shot (with multiple urls) and the script just seems to hang there. I let it hang for a few minutes and just ctrl + c'd out of it. I was actually wanting to use htmlunit because I was told it dealt with javascript lol, perhaps I was informed incorrectly?
StartingGroovy
Got it to stop hanging... Hmm I went ahead and tested this with the msnbc link and received: **com.gargoylesoftware.htmlunit createElementNS INFO: Bad input type: "type", creating a text input** so I then tried it with http://www.perl.com/CPAN/ and it told me **No such property: page for class** I don't know why it would tell me page doesn't exist
StartingGroovy
At any rate, you answered my original question. I'm going to make this as solved. Thanks xlson
StartingGroovy