ansaurus

Question

How Do You Prevent A javax Transformer From Escaping Whitespace?

Answer 1

A:

Sometimes with things like this, replacing them yourself with regex afterwards is not an entirely bad option, which at least gets you going until you find a better option later.

Christopher Morley 2009-06-29 19:02:41

Thanks for the suggestion. I'll use it if I absolutely can't find anything better, but my desire to avoid kludges (and my pride; my co-workers might see this code someday ;-) ) will prevent me from using it otherwise.

machineghost 2009-06-30 00:34:36

Answer 2

+1 A:

You could try using a SAXTransformerFactory in combination with a XMLReader.

Something like:

SAXTransformerFactory transformFactory = (SAXTransformerFactory) TransformerFactory.newInstance();
StreamSource source = new StreamSource(TRANSFORMER_PATH);
StringWriter extractionWriter = new StringWriter();

TransformerHandler transformerHandler = null;
try {
    transformerHandler = transformFactory.newTransformerHandler(source);
    transformerHandler.setResult(new StreamResult(extractionWriter));
} catch (TransformerConfigurationException e) {
    throw new SAXException("Unable to create transformerHandler due to transformer configuration exception.");
}

XMLReader reader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
reader.setContentHandler(transformerHandler);
reader.parse(new InputSource(new FileReader(xml)));
System.err.println(extractionWriter.toString());

You should be able to set the SAX parser to not include ignorable whitespace, if it doesn't already do it by default. I haven't actually tested this, but I do something similar in one of my projects.

jwaddell 2009-06-30 06:52:51

Thanks for the suggestion, but again (as I said to Christopher Morley) a post-processing extra processing layer is really a kludge; what I'm really looking for is a way to tell the Transformer to just not convert tabs in to entity references in the first place.

machineghost 2009-06-30 16:36:23

Answer 3

A:

Is there any reason you are reading the file into a string first instead of using a file stream directly?

Instead of

String xml = FileUtils.readFileToString(new File(sampleXmlPath));
transformer.transform(new StreamSource(new StringReader(xml)),
    new StreamResult(extractionWriter));

You could try

transformer.transform(new StreamSource(new FileReader(sampleXmlPath)),
    new StreamResult(extractionWriter));

This may not be the cause of the problem, but I've seen it cause similar problems before. If your FileUtils.readFileToString is the Commons.IO version, it's reading the string in as UFT-16 (the Java default, IIRC) instead of what you want, which is UTF-8.

16bytes 2009-06-30 18:18:50

Although I do <3 FileUtils, in this particular case I wasn't using it at all (I experienced the same issue even running Xalan directly from the command line).

machineghost 2009-06-30 22:16:34

Answer 4

+1 A:

So the answer to this one turned out to be pretty lame: update Xalan. I don't know what was wrong with my old version, but when I switched to the latest version at: http://xml.apache.org/xalan-j/downloads.html suddenly the entity-escaping of tabs just went away. Thanks everyone for all your help though.

machineghost 2009-06-30 22:18:42

Answer 5

A:

I need to update a node value and write it again into an XML file

Transformer xformer = TransformerFactory.newInstance().newTransformer(); xformer.transform(new DOMSource(doc), new StreamResult(new File("MyTest1.xml")));

the Output i get is

<test name="1">ff1 &quot;</test> <test name="2">ff1 "</test>

i need the output as

<test name="1">ff1 "</test> <test name="2">ff1 "</test>

Can anyone help me on this?

dilip 2010-09-10 05:20:30

You should really file this as a new question; asking a question in the answer to another question is very un-Stack Overflow-ish.

machineghost 2010-09-10 18:33:49

ansaurus

tags:

views:

answers:

How Do You Prevent A javax Transformer From Escaping Whitespace?

related questions