Unicode in C++

A:

Here is a checklist for Windows programming:

All strings enclosed in _T("my string")
strlen() etc. functions replaced with _tcslen() etc.
Use LPTSTR and LPCTSTR instead of char * and const char *
When starting new projects in Dev Studio, religiously make sure the Unicode option is selected in your project properties.
For C++ strings, use std::wstring instead of std::string

Adam Pierce 2008-09-11 01:33:53

Do not use "T" strings, chars and functions, unless you intend to do both Unicode and ANSI builds. If you only intend to do Unicode builds, just do regular wide character stuff:L"my wide string"wcslen(L"my string")etc

1800 INFORMATION 2008-09-11 01:52:13

Agree, only use _T macros if you want generic text, i.e., the ability to code for both Unicode and Ascii/MBCS.

James D 2008-09-11 02:23:43

In case you want do both Unicode and ANSI for C++ strings use something liketypedef std::basic_string<TCHAR> tString;

Serge 2008-09-11 07:10:53

Ah yes, I always do #ifdef _UNICODE #define tstring std::wstring #else #define tstring std::string #endif but I like your way better Serge.

Adam Pierce 2008-09-17 04:38:17

+13 A:

Use ICU for dealing with your data (or a similar library)
In your own data store, make sure everything is stored in the same encoding
Make sure you are always using your unicode library for mundane tasks like string length, capitalization status, etc. Never use standard library builtins like is_alpha unless that is the definition you want.
I can't say it enough: _**never iterate over the indices of a string if you care about correctness, always use your unicode library for this.**_

hazzen 2008-09-11 01:37:17

A:

Use IBM's International Components for Unicode

Joe Schneider 2008-09-11 01:39:07

+2 A:

Look at http://stackoverflow.com/questions/11635/case-insensitive-string-comparison-in-c

That question has a link to the Microsoft documentation on Unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx

If you look on the left-hand navigation side on MSDN next to that article, you should find a lot of information pertaining to Unicode functions. It is part of a chapter on "Encoding Characters" (http://msdn.microsoft.com/en-us/library/cc194786.aspx)

It has the following subsections:

The Code-Page Model
Double-Byte Character Sets in Windows
Unicode
Compatibility Issues in Mixed Environments
Unicode Data Conversion
Migrating Windows-Based Programs to Unicode
Summary

amdfan 2008-09-11 01:40:08

+2 A:

Our company (and others) use the open source Internation Components for Unicode (ICU) library originally developed by Taligent.

It handles strings, locales, conversions, date/times, collation, transformations, et. al.

Start with the ICU Userguide

jschroedl 2008-09-11 01:46:51

ansaurus

tags:

views:

answers:

related questions