tags:

views:

30

answers:

2

I'm using an API that processes my files and presents optimized output, but some special characters are not preserved, for example:

Input: äöü

Output: äöü

How do I fix this? What encoding should I use?

Many thanks for your help!

A: 

I am not sure what language you're using, but things like this occur when there is a mismatch between the encoding of the content when entered and encoding of the content when read in.

So, you might want to specify exactly what encoding to read the data. You may have to play with the actual encoding you need to use

string.getBytes("UTF-8")
string.getBytes("UTF-16")
string.getBytes("UTF-16LE")
string.getBytes("UTF-16BE") 
etc...

Also, do some research about the system where this data is coming from. For example, web services from ASP.NET deliver the content as UTF-16LE, but Java uses UTF-16BE encoding. When these two system talk to each other with extended characters, they might not understand each other exactly the same way.

Anatoly G
A: 

It really depend what processing you are done to your data. But in general, one powerful technique is to convert it to UTF-8 by Iconv, for example, and pass it through ASCII-capable API or functions. In general, if those functions don't mess with data they don't understand as ASCII, then the UTF-8 is preserved -- that's a nice property of UTF-8.

動靜能量