Hello!
I'm currently working on a MFC program that specifically has to work with UTF-8. At some point, I have to write UTF-8 data into a file; to do that, I'm using CFiles and CStrings.
When I get to write utf-8 (russian characters, to be more precise) data into a file, the output looks like
Ðàñïå÷àòàíî:
Ñèñòåìà
Ïðîèçâîäñòâî
and etc. This is assurely not utf-8. To read this data properly, I have to change my system settings; changing non ASCII characters to a russian encoding table does work, but then all my latin based non-ascii characters get to fail. Anyway, that's how I do it.
CFile CSVFile( m_sCible, CFile::modeCreate|CFile::modeWrite);
CString sWorkingLine;
//Add stuff into sWorkingline
CSVFile.Write(sWorkingLine,sWorkingLine.GetLength());
//Clean sWorkingline and start over
Am I missing something? Shall I use something else instead? Is there some kind of catch I've missed? I'll be tuned in for your wisdom and experience, fellow programmers.
EDIT: Of course, as I just asked a question, I finally find something which might be interesting, that can be found here. Thought I might share it.
EDIT 2:
Okay, so I added the BOM to my file, which now contains chineese character, probably because I didn't convert my line to UTF-8. To add the bom I did...
char BOM[3]={0xEF, 0xBB, 0xBF};
CSVFile.Write(BOM,3);
And after that, I added...
TCHAR TestLine;
//Convert the line to UTF-8 multibyte.
WideCharToMultiByte (CP_UTF8,0,sWorkingLine,sWorkingLine.GetLength(),TestLine,strlen(TestLine)+1,NULL,NULL);
//Add the line to file.
CSVFile.Write(TestLine,strlen(TestLine)+1);
But then I cannot compile, as I don't really know how to get the length of TestLine. strlen doesn't seem to accept TCHAR. Fixed, used a static lenght of 1000 instead.
EDIT 3:
So, I added this code...
wchar_t NewLine[1000];
wcscpy( NewLine, CT2CW( (LPCTSTR) sWorkingLine ));
TCHAR* TCHARBuf = new TCHAR[1000];
//Convert the line to UTF-8 multibyte.
WideCharToMultiByte (CP_UTF8,0,NewLine,1000,TCHARBuf,1000,NULL,NULL);
//Find how many characters we have to add
size_t size = 0;
HRESULT hr = StringCchLength(TCHARBuf, MAX_PATH, &size);
//Add the line to the file
CSVFile.Write(TCHARBuf,size);
It compiles fine, but when I go look at my new file, it's exactly the same as when I didn't have all this new code (ex : Ðàñïå÷àòàíî:). It feels like I didn't do a step forward, although I guess only a small thing is what separates me from victory.
EDIT 4:
I removed previously added code, as Nate asked, and I decided to use his code instead, meaning that now, when I get to add my line, I have...
CT2CA outputString(sWorkingLine, CP_UTF8);
//Add line to file.
CSVFile.Write(outputString,::strlen(outputString));
Everything compiles fine, but the russian characters are shown as ???????. Getting closer, but still not that. Btw, I'd like to thank everyone who tried/tries to help me, it is MUCH appreciated. I've been stuck on this for a while now, I can't wait for this problem to be gone.
FINAL EDIT (I hope) By changing the way I first got my UTF-8 characters (I reencoded without really knowing), which was erroneous with my new way of outputting the text, I got acceptable results. By adding the UTF-8 BOM char at the beginning of my file, it could be read as Unicode in other programs, like Excel.
Hurray! Thank you everyone!