views:

381

answers:

2

Hello everybody!

I am currently having trouble with German umlaut values in a XML document I received.

It displays / saves the value as a "ü" instead of a "ü".

The XML Encoding is set to UTF-8 which should be capable of displaying umlauts.

Also I couldn't find any option to set a locale on the SAX parser.

Is there any other way I can make the values save correctly?

btw: I am using eclipse as IDE.

All help is very appreciated!

Thanks in advance!

+1  A: 

The XML is encoded in UTF-8, but you are decoding it with ISO-8859-1.

Try to use InputStream and other "binary"-oriented APIs for XML. Avoid using a Reader, or trying to convert from byte[] to a String before parsing XML. You are much more likely to mess up the character encoding than the parser is.

erickson
Yeap, using a InputStream instead of a Reader worked like a charm! Thanks a lot for this fast response!
Shaharyar
A: 

Setting XML encoding to UTF-8 in XML declaration is the one thing but another thing is the physical encoding of the XML document i.e., you can have a XML file that says <?xml version="1.0" encoding="utf-8"?> but the file itself might still be ANSI encoded (or whatever).

The Chairman
That would be malformed XML. The XML is fine in this case. The code reading it is wrong.
erickson