tags:

views:

132

answers:

3

I am planning on developing a windows based client app and a platform agnostic server app. The client app basically sends messages to the server app. Client app can send messages in English or in other languages. Should I be using UNICODE for encoding messages in my client app? What is the general practise among applications involved in network communication? My client and server app will be using custom protocol for exchanging messages over TCP/IP. What UNICODE encoding does Windows and UNIX platform support by default? Should I be exchanging the encoding type in my protocol as well for decoding the UNICODE messages? Please advise.

+3  A: 

Look for UTF-8, the encoding of unicode in 8 bit bytes efficient for English and western languages.

It is always a good idea to exchange the encoding type, in case you at a later stage want to support something else.

UTF-8 is supported by all major OS:es and computer languages.

some
Indeed. Outside of Windows, UTF-8 is *the* Unicode encoding to use.
Rob Kennedy
Inside of Windows too!
bzlm
+1  A: 

If you control both the server and the client, I'd pick 1 encoding type and stick with it.

I would suggest either UTF-8 (most efficient for english and western languages) or UTF-16 (make sure to chose a byte order).

Adam Tegen
A: 

You can use whatever encoding you want, you just have to be careful about things like byte order. Windows internally uses UTF-16 (little-endian), so if you expect most systems to be Windows, then you should probably go with that. Otherwise, I'd recommend UTF-8, which doesn't have byte-order issues to worry about.

If you do go with UTF-16 (or UTF-32, which I definitely would not recommend), spell out in no uncertain terms what the endianness of the data on the wire is. Then, for every client which reads or writes a Unicode character to a network socket, convert from the platform's native endianness to the network endianness - this is either a no-op or a byte swap.

Adam Rosenfield