views:

406

answers:

5

I'm using the javax.xml.transform.Transformer class to perform some XSLT translations, like so:

TransformerFactory factory = TransformerFactory.newInstance();
StreamSource source = new StreamSource(TRANSFORMER_PATH);
Transformer transformer = factory.newTransformer(source);
StringWriter extractionWriter = new StringWriter();
String xml = FileUtils.readFileToString(new File(sampleXmlPath));
transformer.transform(new StreamSource(new StringReader(xml)),
     new StreamResult(extractionWriter));
System.err.println(extractionWriter.toString());

However, no matter what I do I can't seem to avoid having the transformer convert any tabs that were in the source document in to their character entity equivalent (	). I have tried both:

transformer.setParameter("encoding", "UTF-8");

and:

transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");

but neither of those help. Does anyone have any suggestions? Because:

&#9;&#9;&#9;&#9;&#9;<MyElement>

looks really stupid (even if it does work).

A: 

Sometimes with things like this, replacing them yourself with regex afterwards is not an entirely bad option, which at least gets you going until you find a better option later.

Christopher Morley
Thanks for the suggestion. I'll use it if I absolutely can't find anything better, but my desire to avoid kludges (and my pride; my co-workers might see this code someday ;-) ) will prevent me from using it otherwise.
machineghost
+1  A: 

You could try using a SAXTransformerFactory in combination with a XMLReader.

Something like:

SAXTransformerFactory transformFactory = (SAXTransformerFactory) TransformerFactory.newInstance();
StreamSource source = new StreamSource(TRANSFORMER_PATH);
StringWriter extractionWriter = new StringWriter();

TransformerHandler transformerHandler = null;
try {
    transformerHandler = transformFactory.newTransformerHandler(source);
    transformerHandler.setResult(new StreamResult(extractionWriter));
} catch (TransformerConfigurationException e) {
    throw new SAXException("Unable to create transformerHandler due to transformer configuration exception.");
}

XMLReader reader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
reader.setContentHandler(transformerHandler);
reader.parse(new InputSource(new FileReader(xml)));
System.err.println(extractionWriter.toString());

You should be able to set the SAX parser to not include ignorable whitespace, if it doesn't already do it by default. I haven't actually tested this, but I do something similar in one of my projects.

jwaddell
Thanks for the suggestion, but again (as I said to Christopher Morley) a post-processing extra processing layer is really a kludge; what I'm really looking for is a way to tell the Transformer to just not convert tabs in to entity references in the first place.
machineghost
A: 

Is there any reason you are reading the file into a string first instead of using a file stream directly?

Instead of

String xml = FileUtils.readFileToString(new File(sampleXmlPath));
transformer.transform(new StreamSource(new StringReader(xml)),
    new StreamResult(extractionWriter));

You could try

transformer.transform(new StreamSource(new FileReader(sampleXmlPath)),
    new StreamResult(extractionWriter));

This may not be the cause of the problem, but I've seen it cause similar problems before. If your FileUtils.readFileToString is the Commons.IO version, it's reading the string in as UFT-16 (the Java default, IIRC) instead of what you want, which is UTF-8.

16bytes
Although I do <3 FileUtils, in this particular case I wasn't using it at all (I experienced the same issue even running Xalan directly from the command line).
machineghost
+1  A: 

So the answer to this one turned out to be pretty lame: update Xalan. I don't know what was wrong with my old version, but when I switched to the latest version at: http://xml.apache.org/xalan-j/downloads.html suddenly the entity-escaping of tabs just went away. Thanks everyone for all your help though.

machineghost
A: 

I need to update a node value and write it again into an XML file

Transformer xformer = TransformerFactory.newInstance().newTransformer(); xformer.transform(new DOMSource(doc), new StreamResult(new File("MyTest1.xml")));

the Output i get is

<test name="1">ff1 &amp;quot;</test> <test name="2">ff1 "</test>

i need the output as

<test name="1">ff1 &quot;</test> <test name="2">ff1 "</test>

Can anyone help me on this?

dilip
You should really file this as a new question; asking a question in the answer to another question is very un-Stack Overflow-ish.
machineghost