tags:

views:

33

answers:

2

I have a Java String with SGML, something like this...

<misspell></misspell><plain>I</plain> <plain>know</plain> <plain>you</plain> <suggestion>ducky</suggestion> <plain>suck</plain> <plain>and</plain> <plain>I</plain> <plain>rocky</plain> <plain>rock</plain>

How do I parse it to get for instance say the text inside <suggestion> </suggestion>so as to get "ducky" out??

Will javax.swing.text.html.parser.Parse can be of any help? or I can only parse HTML docs with it?

A: 

The string you show is not HTML, but it could be parsed by an XML parser.

The SAX API is part of the JDK and AFAIK most XML parsers implement it.

Péter Török
Its just a plain String.Will the SAX API(javax.xml.parsers) work?
Myth17
@Myth, from the [Javadoc](http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/parsers/SAXParser.html): "XML can be parsed from a variety of input sources. These input sources are InputStreams, Files, URLs, and SAX InputSources". And it seems to be possible to construct an InputSource using a StringReader.
Péter Török
A: 

try an html parser, they are (by necessity) quite forgiving of malformed markup and html is by nature based on SGML.

e.g. http://htmlparser.sourceforge.net/

Marc van Kempen