Reading content of a URL in Grails/Groovy

views:

201

answers:

Reading content of a URL in Grails/Groovy

Is there are easy way of parsing the whole HTML page and extract a specific section from the code of that page? i.e. i got this URL from the RSS feed of this site: http://www.groundreport.com/Sports/Bret-Hart-says-Farewell-to-WWE_4/2918823

What i want to do is parse that link and retrieve related images, tags, and other info from that page. Is there a Java library or Grails plugin that can easily parse an HTML code?

Your suggestion on how to approach this task will be highly appreciated.

+1 A:

You can try the Tagsoup library.
There is an example here.

Philippe 2010-03-04 09:44:56

looks promising. thanks a lot!

firnnauriel 2010-03-04 10:06:18

I've had good experiences with TagSoup for HTML parsing, +1.

Rob Hruska 2010-03-10 16:22:16

I briefly looked at WebHarvest over a year ago and it seemed nice.

wwwclaes 2010-03-04 10:14:18

If the HTML is well-formed XML, you can use any Groovy XML parsing technique. In practice, you probably won't be able to guarantee this, so a HTML parser is a better option. In the past, I've used the Jericho HTML parser (a Java library) and have been very satisfied with the results.

Don 2010-03-04 14:51:36

ansaurus

tags:

views:

answers:

Reading content of a URL in Grails/Groovy

related questions