tags:

views:

570

answers:

3

i want to parse a file which is similar to a HTML file . Its not exactly a html file.It can contain some user defined tags. I dont know in advance how the tags are nested in one another in advance.The tags may also have attributes. I think i shold use a SAX parser. Does java have a inbuilt SAX . Can i call a function when i encounter each tag?

A: 

SAX was originally Java only, so yes, Java has a built-in SAX parser - http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/parsers/SAXParser.html. This will only work if your document is well formed.

stevedbrown
+2  A: 

I think you should use StAX instead, which is faster and easier to use than SAX. It's part of Java SE 6.

gustafc
I disagree with it being easier to use. startElement() in SAX essentially passes you a map of attributes. You otehrwise have to write a more complicated piece of code to derive this information from StAX.
cletus
On the other hand, StAX lets you parse XML documents with a simple recursive descent parser where the call stack matches the element stack. Using SAX you'd have to write a state machine, which requires a lot more boilerplate and which at least I consider a lot harder to get right than a util method reading the attributes from a StAX cursor into a map.
gustafc
+3  A: 

Use following packages, java.io,javax.xml.parsers,org.xml.sax.

SAXParserFactory spf = SAXParserFactory.newInstance();
XMLReader reader = null;

  SAXParser parser = spf.newSAXParser();
  reader = parser.getXMLReader();

reader.setContentHandler(new MyContentHandler());

//XMLReader to parse the entire file.

  InputSource is = new InputSource(filename);
  reader.parse(is);

// Implements the methods of ContentHandler

class MyContentHandler implements ContentHandler {
}
adatapost