views:

55

answers:

2

How do you convert an std::string encoded in extended ascii to utf8 using microsoft visual studio 2005?

I'm using google protocol buffer and it's complaining about non utf8 characters in my string if I give it without conversion, which is true...

+2  A: 

Use MultiByteToWideChar to convert your string to UTF-16, then use WideCharToMultiByte to convert it to UTF-8.

sbi
MultiByteToWideChar converts UTF-8 to UTF-16, it is wrong
Andrey
@Andrey: Last time I looked into the issue (which I freely admit to be long ago), I found no other way than the route via UTF-16.
sbi
ASCII chars with code > 127 are invalid in terms of UTF-8. and MultiByte stands for UTF-8. this will not work, i tell you, just try. may be you (or me :) ) misunderstood the question.
Andrey
@Andrey: `MultiByteToWideChar()` can also convert system-encoded text to UTF-16. Is that the source of our misunderstanding? TTBOMK, this is all the Win32 API offers to convert between system encoding and UTF-8.
sbi
@Andrey, `MultiByteToWideChar` can convert from *many* code pages, not just UTF-8. That's what its first parameter indicates.
Rob Kennedy
@sbi you are right, this will work if you pass 1252 as codepage, but still i like my method, it is faster :)
Andrey
thanks it worked!
foke
+1  A: 

Let's assume that mysterious Exntended ASCII is just Latin1. Then use mask from wikipedia:

110y yyxx 10xx xxxx

Since you have only 00..FF then you have: 1100 00xx 10xx xxxx.

Conversion algorithm will be following, if char code is < 127 then just dump it as is, if it is > 127 then you do 0xC0 | ((x & 0xC0) >> 24) goes to first byte, second is ((x & 0x3F) | 0x80)

Andrey
What "mask from Wikipedia" are you talking about? You have not generated valid UTF-8, either; UTF-8 does not have any zero bytes in it.
Rob Kennedy