views:

413

answers:

2

Alright, so here is my issue. I need to generate xml in Java to pass onto another application. I started off thinking this would be easy using an org.w3c.dom.Document. Unfortunately the application I need to pass the XML off to requires that special characters like " need to be encoded as ASCII (") instead of their character entity ("). Does anybody know a simple solution to this?

P.S. Changing the target application is not an option.

Update: So let's say my app is given the following string as input:

he will "x" this if needed

My app needs to output this:

<field value="he will &#034;x&#034; this if needed"/>

The XML generator I am using and I am guessing most others output this but this is not valid for my target:

<field value="he will &quot;x&quot; this if needed"/>

I realize my target may not quite be up to XML standards, but that doesn't help me as I have no control over it. This is my situation and I have to deal with it. Any ideas other than simply converting every special character by hand?

A: 

To my knowledge, the standard API doesn't expose the escape mechanism. You'd probably need to write your own XML emitter.

If you don't mind a 3rd party API, you could use JDOM. Something like:

XMLOutputter outputter = new XMLOutputter() {
  @Override
  public String escapeAttributeEntities(String sequence) {
    // TODO: bug: code only works for Basic Multilingual Plane
    StringBuilder out = new StringBuilder();
    for (int i = 0; i < sequence.length(); i++) {
      process(sequence.charAt(i), out);
    }
    return out.toString();
  }

  private void process(char codePoint, StringBuilder out) {
    if (codePoint == '"' || codePoint == '\'' || codePoint == '&'
        || codePoint == '<' || codePoint == '>' || codePoint > 127) {
      out.append("&#");
      out.append(Integer.toString(codePoint));
      out.append(";");
    } else {
      out.append(codePoint);
    }
  }
};
outputter.setFormat(Format.getPrettyFormat().setEncoding("US-ASCII"));

Element foo = new Element("foo").setAttribute("msg",
    "he will \"x\" this if needed");
Document doc = new Document().setRootElement(foo);
outputter.output(doc, System.out);

This emits:

<?xml version="1.0" encoding="US-ASCII"?>
<foo msg="he will &#34;x&#34; this if needed" />

(I'd still give the XML spec a once-over before doing this and fix up the character handling to support characters above U+FFFF.)

McDowell
A: 

I wonder how you serialize the XML--to a string, a stream, etc. You can post-process your output to replace general entity references with their numeric equivalents, e.g.,

sed 's/&lt;/\&#60;/g; s/&gt;/\&#62;/g; s/&amp;/\&#38;/g; s/&apos;/\&#39/g; s/&quot;/\&#34;/g'

or

xmlResultString.replaceAll("&lt;", "&#60;"); //etc. for other entities

There are exactly 5 pre-defined general entities in XML (http://www.w3.org/TR/REC-xml/#sec-predefined-ent) and you can safely perform this as a textual replacement. There is no danger that it modify anything except the references (well, maybe in comments and PIs, but it doesn't sound like your scenario uses them, or that the target even accepts them).

I agree with Mark that your target application is not a conforming XML processor. At least it comes with documentation that states explicitly where it diverges from XML. I believe the Recommendation (link above) disagrees with Christopher's comment, though it's irrelevant to OP's question as his target declares its non-conformance to the Recommendation.

Ari.

iter
If only it came with documentation...this was discovered through trial and error. Thanks for the suggestion.
Marshmellow1328