views:

543

answers:

1

Sorry if this is too simple, but I simply couldn't find a tutorial nor the documentation of the Java version of TagSoup.

Basically I want to download an HTML webpage from the internet and turn it into XHTML, contained in a string. How can I do this with TagSoup?

Thanks!

+5  A: 

Something like this:

wget -O - example.com/bad.html | java -jar tagsoup.jar

Or, from Java:

To parse HTML:

  • Create an instance of org.ccil.cowan.tagsoup.Parser
  • Provide your own SAX2 ContentHandler
  • Provide an InputSource referring to the HTML
  • And parse()!
Pascal Thivent
Precisely this! I want to get this value into a string in Java.
konr
@konr, if this answer is precisely what you wanted, you might want to accept/up-vote it =)
David Thomas