ansaurus

Question

Using Poco XMLWriter with UTF8 strings in C++

Answer 1

+1 A:

It sounds like you have a byte string in Windows code page 1252 encoding. “Character -105” presumably really means byte 0x97, which would map to Unicode character U+2014 Em Dash (—) in cp1252.

I'm not familiar with Poco, but I would guess you're expected to convert your cp1252 strings to UTF-8 output encoding using a TextConverter with Windows1252Encoding and UTF8Encoding.

Although if what you really have is an “ANSI string” (a byte string in the default code page for the current machine's locale), 1252 might not be the right answer and you might have to use a function from another library to do the conversion properly.

bobince 2010-10-25 12:12:44

Perfect! Thank you so much. My confusion had arisen because Im scraping strings out of IE and was thinking 'well the webpage is utf8 so whats the problem?' But as you pointed out the string was a cp1252 encoded string. Using TextConverter as you suggested to map from cp1252 to utf8 was the right result. Im editting my question to contain the answer because finding examples of this stuff is a drag.

Andrew Bucknell 2010-10-25 12:52:07

ansaurus

tags:

views:

answers:

Using Poco XMLWriter with UTF8 strings in C++

related questions