views:

89

answers:

1

Hi,

I have a question on sending XML as string in a Webservice. One of our providers has developed a webservice that we should use. Their webservice is basically just a transport mechanism for their own request/response messages, e.g. class MyRequest is serialized to an XML string using JAXB passed to a setRequest method in their webservice, probably because it was the easiest way for them or maybe because they wanted high transparency in their application... well I don't know.

Anyhow. Here is my question.

If I have a webservice that has character encoding ISO_8859-1 but the serialized XML has a character encoding UTF-8 (or any other encoding supporting more characters than ISO_8859-1) will these always be serialized and deserialized correctly? Or will I have to send information about the content of the string? And if so, how can I do that?

The server side of the webservice is written in .NET. How is the compability between Java and .NET? Are there encodings in .NET that isn't supported in Java or vice versa?

/ Andreas

A: 

If they implement the web service correctly (and you do too), then you don't need to worry about character encoding, because:

  1. (well-formed) XML has built-in metadata that allows the character encoding that was used to be found out exactly and
  2. XML allows any Unicode character to be represented in any encoding, due to numeric character references

So to summarize: make sure both sides handle their text in a Unicode-capable environment (C# and Java are fine for this) and use correct XML libraries (both environments come with those) and as soon as you don't mess it up manually, you should be fine.

Joachim Sauer
Okey, so if the webservice have character encoding UTF_8859-1 and the enclosed serialized XML have character encoding UTF-8, the enclosed string will be escaped. Is that correct? What if the enclosed serialized string is wrapped in a CDATA section, I guess that will still be valid since it is character data, right?
Not really. XML inside other XML doesn't really have its own encoding, since it's simply Unicode text inside the outer XML (and as such has the same encoding). Any characters not representable in the out XMLs encoding will be represented by numeric character references.
Joachim Sauer