tags:

views:

2479

answers:

5
+18  Q: 

Unicode in C++

What's the best practice of unicode processing in C++?

A: 

Here is a checklist for Windows programming:

  • All strings enclosed in _T("my string")
  • strlen() etc. functions replaced with _tcslen() etc.
  • Use LPTSTR and LPCTSTR instead of char * and const char *
  • When starting new projects in Dev Studio, religiously make sure the Unicode option is selected in your project properties.
  • For C++ strings, use std::wstring instead of std::string
Adam Pierce
Do not use "T" strings, chars and functions, unless you intend to do both Unicode and ANSI builds. If you only intend to do Unicode builds, just do regular wide character stuff:L"my wide string"wcslen(L"my string")etc
1800 INFORMATION
Agree, only use _T macros if you want generic text, i.e., the ability to code for both Unicode and Ascii/MBCS.
James D
In case you want do both Unicode and ANSI for C++ strings use something liketypedef std::basic_string<TCHAR> tString;
Serge
Ah yes, I always do #ifdef _UNICODE #define tstring std::wstring #else #define tstring std::string #endif but I like your way better Serge.
Adam Pierce
+13  A: 
  • Use ICU for dealing with your data (or a similar library)
  • In your own data store, make sure everything is stored in the same encoding
  • Make sure you are always using your unicode library for mundane tasks like string length, capitalization status, etc. Never use standard library builtins like is_alpha unless that is the definition you want.
  • I can't say it enough: _**never iterate over the indices of a string if you care about correctness, always use your unicode library for this.**_
hazzen
A: 

Use IBM's International Components for Unicode

Joe Schneider
+2  A: 

Look at http://stackoverflow.com/questions/11635/case-insensitive-string-comparison-in-c

That question has a link to the Microsoft documentation on Unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx

If you look on the left-hand navigation side on MSDN next to that article, you should find a lot of information pertaining to Unicode functions. It is part of a chapter on "Encoding Characters" (http://msdn.microsoft.com/en-us/library/cc194786.aspx)

It has the following subsections:

  • The Code-Page Model
  • Double-Byte Character Sets in Windows
  • Unicode
  • Compatibility Issues in Mixed Environments
  • Unicode Data Conversion
  • Migrating Windows-Based Programs to Unicode
  • Summary
amdfan
+2  A: 

Our company (and others) use the open source Internation Components for Unicode (ICU) library originally developed by Taligent.

It handles strings, locales, conversions, date/times, collation, transformations, et. al.

Start with the ICU Userguide

jschroedl