Currently, I am developing an app for a China customer. China customer are mostly switch to GB2312 language in their OS encoding. I need to write a text file, which will be encoded using GB2312.
- I use std::ofstream file
- I compile my application under MBCS mode, not unicode.
- I use the following code, to convert CString to std::string, and write it to file using ofstream
std::string Utils::ToString(CString& cString) {
/* Will not work correctly, if we are compiled under unicode mode. */
return (LPCTSTR)cString;
}
To my surprise. It just works. I thought I need to at least make use of wstring. I try to do some investigation.
Here is the MBCS.txt generated.
- I try to print a single character named 脚 (its value is 0xBDC5)
- When I use CString to carry this character, its length is 2.
- When I use Utils::ToString to perform conversion to std::string, the returned string length is 2.
- I write to file using std::ofstream
My question is :
- When I exam MBCS.txt using a hex editor, the value is displayed as BD (LSB) and C5 (MSB). But I am using little endian machine. Isn't hex editor should show me C5 (LSB) and BD (MSB)? I check from wikipedia. GB2312 seems doesn't specific endianness.
- It seems that using std::string + CString just work fine for my case. May I know in what case, the above methodology will not work? and when I should start to use wstring?