views:

50

answers:

2

I have a Xerces (2.6) DOMNode object encoded UTF-8. I use to read its TEXT element like this:

CBuffer DomNodeExtended::getText( const DOMNode* node ) const {
  char* p = XMLString::transcode( node->getNodeValue( ) );
  CBuffer xNodeText( p );
  delete p;
  return xNodeText;
}

Where CBuffer is, well, just a buffer object which is lately persisted as it is in a DB.

This works until in the TEXT there are just common ASCII characters. If we have i.e. chinese ones they get lost in the transcode operation.

I've googled a lot seeking for a solution. It looks like with Xerces 3, the DOMWriter class should solve the problem. With Xerces 2.6 I'm trying the XMLTranscoder, but no success yet. Could anybody help?


Edit It looks I was wrong and the DOMWriter class is already available in Xerces 2.6. I'm now trying a solution based on it.

+1  A: 

Don't use the transcode method. The documentation clearly states that it translates the text to the "native code-page" - which is almost always a lossy operation.

Chris Becke
+1  A: 

I've now solved it as follows. I'm still not sure this is the optimal solution though

CBuffer DomNodeExtended::getText( const DOMNode* node ) const
{
  XMLCh tempStr[100];  
  XMLString::transcode("LS", tempStr, 99);
  DOMImplementation *impl = 
      DOMImplementationRegistry::getDOMImplementation(tempStr);
  DOMWriter* myWriter = ((DOMImplementationLS*)impl)->createDOMWriter();
  XMLCh *strNodeValue = myWriter->writeToString(*node);

  XMLTransService::Codes resCode;  
  XMLTranscoder* t = 
      XMLPlatformUtils::fgTransService->makeNewTranscoderFor(
      "UTF-8", resCode, 16*1024);

  unsigned int charsEaten = 0;  
  unsigned int charsReturned = 0;
  char bytesNodeValue[16*1024+4];  
  charsReturned = t->transcodeTo( strNodeValue,
                                  XMLString::stringLen(strNodeValue),
                                  (XMLByte*) bytesNodeValue,
                                  16*1024,
                                  charsEaten,
                                  XMLTranscoder::UnRep_Throw);

  CBuffer xNodeText( bytesNodeValue, charsReturned);

  XMLString::release(&strNodeValue);  
  myWriter->release();
  delete t;

  return xNodeText;
}
Gianluca