views:

708

answers:

3

We have an Java application running on Weblogic server that picks up XML messages from a JMS or MQ queue and writes it into another JMS queue. The application doesn't modify the XML content in any way. We use BEA's XMLObject to read and write the messages into queues.

The XML messages contain the encoding type declarations as UTF-8.

We have an issue when the XML contains characters that are out side the normal ASCII range (like £ symbol for example). When the message is read from the queue we can see that the £ symbol is intact, however once we write it to the destination queue, the £ symbol is lost and is replaced with £ instead.

I have checked the OS level settings (locale settings) and everything seems to be fine. What else should I be checking to make sure that this doesn't happen?

+2  A: 

Without a few more specifics, I'd guess that there is a method that optionally takes an encoding somewhere that isn't specified and is defaulting to ISO-8859-1. Commonly, check anything that passes between an InputStream/OutputStream and a Reader/Writer.

For instance, an OutputStreamWriter takes an optional encoding that you could be leaving out.

Joe Liversedge
+2  A: 

once we write it to the destination queue, the £ symbol is lost and is replaced with £ instead

That tells me the character is being written as UTF-8, but it's being read as if it were in a single-byte encoding like ISO-8859-1. (For any character in the range U+00A0..U+00BF, if you encode it as UTF-8 and decode it as ISO-8859-1, you end up with the two-character sequence ÃX, where X is the original character.) I would look at the encoding settings of the receiving JMS queue.

Alan Moore
Yes. It was an issue with the encoding setting, not at the JMS queue, but at the OS level (which I thought was correct and mentioned so in my original query).
Mani
I'm glad you figured it out, and I hope you're taking the advice offered in the other replies: if you really have to do the byte/character conversions yourself, you should always specify the encoding instead of relying on the OS settings.
Alan Moore
+1  A: 
erickson