views:

201

answers:

3

Is there are easy way of parsing the whole HTML page and extract a specific section from the code of that page? i.e. i got this URL from the RSS feed of this site: http://www.groundreport.com/Sports/Bret-Hart-says-Farewell-to-WWE_4/2918823

What i want to do is parse that link and retrieve related images, tags, and other info from that page. Is there a Java library or Grails plugin that can easily parse an HTML code?

Your suggestion on how to approach this task will be highly appreciated.

+1  A: 

You can try the Tagsoup library.
There is an example here.

Philippe
looks promising. thanks a lot!
firnnauriel
I've had good experiences with TagSoup for HTML parsing, +1.
Rob Hruska
A: 

I briefly looked at WebHarvest over a year ago and it seemed nice.

wwwclaes
A: 

If the HTML is well-formed XML, you can use any Groovy XML parsing technique. In practice, you probably won't be able to guarantee this, so a HTML parser is a better option. In the past, I've used the Jericho HTML parser (a Java library) and have been very satisfied with the results.

Don