Convertion from extended ascii to utf8

views:

answers:

+1 Q:

Convertion from extended ascii to utf8

How do you convert an std::string encoded in extended ascii to utf8 using microsoft visual studio 2005?

I'm using google protocol buffer and it's complaining about non utf8 characters in my string if I give it without conversion, which is true...

+2 A:

Use MultiByteToWideChar to convert your string to UTF-16, then use WideCharToMultiByte to convert it to UTF-8.

sbi 2010-09-09 16:30:15

MultiByteToWideChar converts UTF-8 to UTF-16, it is wrong

Andrey 2010-09-09 16:32:52

@Andrey: Last time I looked into the issue (which I freely admit to be long ago), I found no other way than the route via UTF-16.

sbi 2010-09-09 16:41:13

ASCII chars with code > 127 are invalid in terms of UTF-8. and MultiByte stands for UTF-8. this will not work, i tell you, just try. may be you (or me :) ) misunderstood the question.

Andrey 2010-09-09 16:42:39

@Andrey: `MultiByteToWideChar()` can also convert system-encoded text to UTF-16. Is that the source of our misunderstanding? TTBOMK, this is all the Win32 API offers to convert between system encoding and UTF-8.

sbi 2010-09-09 16:48:48

@Andrey, `MultiByteToWideChar` can convert from *many* code pages, not just UTF-8. That's what its first parameter indicates.

Rob Kennedy 2010-09-09 16:54:34

@sbi you are right, this will work if you pass 1252 as codepage, but still i like my method, it is faster :)

Andrey 2010-09-09 16:55:30

thanks it worked!

foke 2010-09-09 17:04:03

+1 A:

Let's assume that mysterious Exntended ASCII is just Latin1. Then use mask from wikipedia:

110y yyxx 10xx xxxx

Since you have only 00..FF then you have: 1100 00xx 10xx xxxx.

Conversion algorithm will be following, if char code is < 127 then just dump it as is, if it is > 127 then you do 0xC0 | ((x & 0xC0) >> 24) goes to first byte, second is ((x & 0x3F) | 0x80)

Andrey 2010-09-09 16:35:44

What "mask from Wikipedia" are you talking about? You have not generated valid UTF-8, either; UTF-8 does not have any zero bytes in it.

Rob Kennedy 2010-09-09 16:58:59

ansaurus

tags:

views:

answers:

Convertion from extended ascii to utf8

related questions