I'm currently working on a MFC program that specifically has to work with UTF-8. At some point, I have to write UTF-8 data into a file; to do that, I'm using CFiles and CStrings.
When I get to write utf-8 (russian characters, to be more precise) data into a file, the output looks like
and etc. This is assurely not utf-8. To read this data properly, I have to change my system settings; changing non ASCII characters to a russian encoding table does work, but then all my latin based non-ascii characters get to fail. Anyway, that's how I do it.
CFile CSVFile( m_sCible, CFile::modeCreate|CFile::modeWrite);
CString sWorkingLine;
//Add stuff into sWorkingline
//Clean sWorkingline and start over
Am I missing something? Shall I use something else instead? Is there some kind of catch I've missed? I'll be tuned in for your wisdom and experience, fellow programmers.
EDIT: Of course, as I just asked a question, I finally find something which might be interesting, that can be found here. Thought I might share it.
Okay, so I added the BOM to my file, which now contains chineese character, probably because I didn't convert my line to UTF-8. To add the bom I did...
char BOM[3]={0xEF, 0xBB, 0xBF};
And after that, I added...
TCHAR TestLine;
//Convert the line to UTF-8 multibyte.
WideCharToMultiByte (CP_UTF8,0,sWorkingLine,sWorkingLine.GetLength(),TestLine,strlen(TestLine)+1,NULL,NULL);
//Add the line to file.
But then I cannot compile, as I don't really know how to get the length of TestLine. strlen doesn't seem to accept TCHAR. Fixed, used a static lenght of 1000 instead.
So, I added this code...
wchar_t NewLine[1000];
wcscpy( NewLine, CT2CW( (LPCTSTR) sWorkingLine ));
TCHAR* TCHARBuf = new TCHAR[1000];
//Convert the line to UTF-8 multibyte.
WideCharToMultiByte (CP_UTF8,0,NewLine,1000,TCHARBuf,1000,NULL,NULL);
//Find how many characters we have to add
size_t size = 0;
HRESULT hr = StringCchLength(TCHARBuf, MAX_PATH, &size);
//Add the line to the file
It compiles fine, but when I go look at my new file, it's exactly the same as when I didn't have all this new code (ex : Ðàñïå÷àòàíî:). It feels like I didn't do a step forward, although I guess only a small thing is what separates me from victory.
I removed previously added code, as Nate asked, and I decided to use his code instead, meaning that now, when I get to add my line, I have...
CT2CA outputString(sWorkingLine, CP_UTF8);
//Add line to file.
Everything compiles fine, but the russian characters are shown as ???????. Getting closer, but still not that. Btw, I'd like to thank everyone who tried/tries to help me, it is MUCH appreciated. I've been stuck on this for a while now, I can't wait for this problem to be gone.
FINAL EDIT (I hope) By changing the way I first got my UTF-8 characters (I reencoded without really knowing), which was erroneous with my new way of outputting the text, I got acceptable results. By adding the UTF-8 BOM char at the beginning of my file, it could be read as Unicode in other programs, like Excel.
Hurray! Thank you everyone!