I have a problem trying to get my head around using UTF8 with Poco::XML::XMLWriter
. In the following code example, everything works fine when the input contains ASCII characters. However, occasionally the string in wordmapIt->first
contains a non-ASCII value, such as a character -105 occurring in the middle of a string. When this happens the xml stream seems to terminate on the -105 char even though there are many other words after this one. I want to save whatever string was there so just stripping the char out isn't the right answer - theres got to be some kind of encoding I can apply (I think) but what?
I'm clearly missing something conceptually but for the life of me I cant figure out the right way to do this.
Poco::XML::XMLString EDocument::makeXMLString()
{
std::stringstream xmlstream;
Poco::UTF8Encoding utf8encoding;
Poco::XML::XMLWriter writer(xmlstream, 0, "UTF-8", &utf8encoding);
writer.startDocument();
std::map<std::string, std::string>::iterator wordmapIt;
for ( wordmapIt = nodeinfo->wordmap.begin(); wordmapIt != nodeinfo->wordmap.end(); wordmapIt++ )
{
writer.startElement("", "", "word");
writer.characters(Poco::XML::toXMLString(wordmapIt->first));
writer.endElement("", "", "word");
}
writer.endDocument();
return xmlstream.str();
}
Edit: Solution based on answer below.
Poco::XML::XMLString EDocument::makeXMLString()
{
std::stringstream xmlstream;
Poco::UTF8Encoding utf8encoding;
Poco::XML::XMLWriter writer(xmlstream, 0, "UTF-8", &utf8encoding);
Poco::Windows1252Encoding windows1252encoding;
Poco::UTF8Encoding utf8encoding;
Poco::TextConverter textconverter(windows1252encoding, utf8encoding);
writer.startDocument();
std::map<std::string, std::string>::iterator wordmapIt;
for ( wordmapIt = nodeinfo->wordmap.begin(); wordmapIt != nodeinfo->wordmap.end(); wordmapIt++ )
{
std::string strword;
textconverter.convert(wordmapIt->first, strword);
writer.startElement("", "", "word");
writer.characters(strword);
writer.endElement("", "", "word");
}
writer.endDocument();
return xmlstream.str();
}