views:

1488

answers:

6

I have an Object that is being marshalled to XML using JAXB. One element contains a String that includes quotes ("). The resulting XML has " where the " existed.

Even though this is normally preferred, I need my output to match a legacy system. How do I force JAXB to NOT convert the HTML entities?

--

Thank you for the replies. However, I never see the handler escape() called. Can you take a look and see what I'm doing wrong? Thanks!

package org.dc.model;

import java.io.IOException;
import java.io.Writer;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;

import org.dc.generated.Shiporder;

import com.sun.xml.internal.bind.marshaller.CharacterEscapeHandler;

public class PleaseWork {
    public void prettyPlease() throws JAXBException {
     Shiporder shipOrder = new Shiporder();
     shipOrder.setOrderid("Order's ID");
     shipOrder.setOrderperson("The woman said, \"How ya doin & stuff?\"");

     JAXBContext context = JAXBContext.newInstance("org.dc.generated");
     Marshaller marshaller = context.createMarshaller();
     marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
     marshaller.setProperty(CharacterEscapeHandler.class.getName(),
       new CharacterEscapeHandler() {
        @Override
        public void escape(char[] ch, int start, int length,
          boolean isAttVal, Writer out) throws IOException {
         out.write("Called escape for characters = " + ch.toString());
        }
       });
     marshaller.marshal(shipOrder, System.out);
    }

    public static void main(String[] args) throws Exception {
     new PleaseWork().prettyPlease();
    }
}

--

The output is this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<shiporder orderid="Order's ID">
    <orderperson>The woman said, &quot;How ya doin &amp; stuff?&quot;</orderperson>
</shiporder>

and as you can see, the callback is never displayed. (Once I get the callback being called, I'll worry about having it actually do what I want.)

--

+2  A: 

Seems like it is possible with Sun's JAXB implementation, although I've not done it myself.

laz
A: 

Solution my teammate found:

PrintWriter printWriter = new PrintWriter(new FileWriter(xmlFile));
DataWriter dataWriter = new DataWriter(printWriter, "UTF-8", DumbEscapeHandler.theInstance);
marshaller.marshal(request, dataWriter);

Instead of passing the xmlFile to marshal(), pass the DataWriter which knows both the encoding and an appropriate escape handler, if any.

Note: Since DataWriter and DumbEscapeHandler are both within the com.sun.xml.internal.bind.marshaller package, you must bootstrap javac.

Elliot
Did you try @laz's answer? That looks like the way to do it "properly".
skaffman
+1  A: 

Hey, I've been playing with your example a bit and debugging the JAXB code. And it seems it's something specific about UTF-8 encoding used. The escapeHandler property of MarshallerImpl seems to be set properly. However it's being used not in every context. If I searched for calls of MarshallerImpl.createEscapeHandler() I found:

public XmlOutput createWriter( OutputStream os, String encoding ) throws JAXBException {
    // UTF8XmlOutput does buffering on its own, and
    // otherwise createWriter(Writer) inserts a buffering,
    // so no point in doing a buffering here.

    if(encoding.equals("UTF-8")) {
        Encoded[] table = context.getUTF8NameTable();
        final UTF8XmlOutput out;
        if(isFormattedOutput())
            out = new IndentingUTF8XmlOutput(os,indent,table);
        else {
            if(c14nSupport)
                out = new C14nXmlOutput(os,table,context.c14nSupport);
            else
                out = new UTF8XmlOutput(os,table);
        }
        if(header!=null)
            out.setHeader(header);
        return out;
    }

    try {
        return createWriter(
            new OutputStreamWriter(os,getJavaEncoding(encoding)),
            encoding );
    } catch( UnsupportedEncodingException e ) {
        throw new MarshalException(
            Messages.UNSUPPORTED_ENCODING.format(encoding),
            e );
    }
}

Note that in your setup the top section (...equals("UTF-8")...) is taken into consideration. However this one doesn't take the escapeHandler. However if you set the encoding to any other, the bottom part of this method is called (createWriter(OutputStream, String)) and this one uses escapeHandler, so EH plays its role. So, adding...

    marshaller.setProperty(Marshaller.JAXB_ENCODING, "ASCII");

makes your custom CharacterEscapeHandler be called. Not really sure, but I would guess this is kind of bug in JAXB.

Grzegorz Oledzki
Thanks for your response, Grzegorz. I agree with you, it appears to be a JAXB bug. And if there is a legitimate reason for it, it'd be nice to have it in the documentation. Thanks!
Elliot
I've filed a bug report in JAXB tracking tool:https://jaxb.dev.java.net/issues/show_bug.cgi?id=693
Grzegorz Oledzki
A: 

Well. I have a similar issue (I think).

I need to leave the default behavior about "escapes", but add also a "single quote" escape to "&apos;".

I'm using the last stable build: jaxb-ri-20091104

And the CharacterEscapeHandler "double" escape the strings. What I mean is that instead of translating the single quote to "&apos;" it transtates it to "&amp;apos;"

I opened an issue a month ago, but nobody has even commented about it yet: https://jaxb.dev.java.net/issues/show_bug.cgi?id=741

Any Ideas?

Ely Sch.
+1  A: 

I checked the XML specification. http://www.w3.org/TR/REC-xml/#sec-references says "well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. " so it appears that the XML parser used by the legacy system is not conformant.

(I know that it does not solve your problem, but it is at least nice to be able to say which component is broken).

Thorbjørn Ravn Andersen
+1  A: 

I have a similar problem, but the solutions presented here won't work for me. Any insight would be helpful.

The difference is that I'm using JAXB to call a remote web service. The poor escape behavior is somewhere in the handling of this service call, so I don't have access to the marshaller instance being used.

Any idea how to get at it, or set some magical property somewhere to make it escape the way I want?

Specifically, it is escaping my '<' characters but not the '>', which I need for communicating with my legacy system.

Bobman