views:

270

answers:

1

Hi, reading the documentation for java org.w3c.dom.ls it seems as a Element only can be serialized to a String with the java native string encoding, UTF-16. I need however to create a UTF-8 string, escaped or what not, I understand that it still will be a UTF-16 String. Anyone has an idea to get around this? I need the string to pass in to a generated WS client that will consume the String, then it should be UTF-8.

the code i use to create the string:

DOMImplementationRegistry domImplementationRegistry = DOMImplementationRegistry.
DOMImplementationLS domImplementationLS = (DOMImplementationLS) REGISTRY.getDOMImplementation("LS");
LSSerializer writer = domImplementationLS.createLSSerializer();
String result = writer.writeToString(element);
+1  A: 

I find that the most flexible way of serializing a DOM to String is to use the javax.xml.transform API:

    Node node = ...
    StringWriter output = new StringWriter();

    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.transform(new DOMSource(node), new StreamResult(output));

    String xml = output.toString();

It's not especially elegant, but it should give you better control over output encoding.

skaffman
works as a charm, but how do I set the encoding explicit, this generates UTF-8 with no configuration?
Tomas
That's up to the `Writer` implementation that you use. `StringWriter` just happens to default to UTF-8, I think.
skaffman
@skaffman - "StringWriter just happens to default to UTF-8". You are mistaken. The String is UTF-16; the transformer might add an XML header that says `<?xml version="1.0" encoding="UTF-8"?>`, but that has nothing to do with any actual encoding operations.
McDowell
Yeah, that makes sense.
skaffman