views:

22

answers:

2

With reference to this question http://stackoverflow.com/questions/3850315/java-splitting-up-a-large-xml-file-with-saxparser I'm essentially reading in an xml file using SAXParser and echoing it to another file.

My problem is that the content of my input file contains character references which are being decoded on reading in. How can I stop this? I want to write out the raw characters with no decoding of references.

(I can't give an example as they are decoded in the page!)

+1  A: 

dom4j's XMLWriter class will re-encode these characters. For example this code:

XMLWriter writer = new XMLWriter(System.out);
writer.startElement(null, null, "example", new AttributesImpl());
writer.write(">");
writer.endElement(null, null, "example");
writer.flush();

will produce this output:

<example>&gt;</example>
Richard Fearn
+1  A: 

I don't think you can do this with SAX. However, you can tell the StAX parser (as opposed to SAX) to not decode character entities when parsing (see this prior answer). You should be able to echo these to the output in the same format as the parser reads them in.

StAX should perform just as well as SAX.

skaffman