tags:

views:

484

answers:

3

Hi Guys,
I have written a wrapper on top of MSXML in c++ . The load method looks like as below. The problem with the code is it fails to load well formed xml sometimes.

Before passing the xml as string I do a string search for xmlns and replace all occurrence of xmlns with xmlns:dns. In the code below I remove bom character. Then i try to load using the MSXML loadXML method . If load succeeds I set the namespace as shown in the code.

 Class XmlDocument{

        MSXML2::IXMLDOMDocument2Ptr spXMLDOM;
         ....
    }

// XmlDocument methods

void XmlDocument::Initialize()
    {

    CoInitialize(NULL);
    HRESULT hr = spXMLDOM.CreateInstance(__uuidof(MSXML2::DOMDocument60));
    if ( FAILED(hr) ) 
    {

        throw "Unable to create MSXML:: DOMDocument object";
    }

}

bool XmlDocument::LoadXml(const char* xmltext)
    {

        if(spXMLDOM != NULL)
        {

            char BOM[3] = {0xEF,0xBB,0xBF};
            //detect unicode BOM character
            if(strncmp(xmltext,BOM,sizeof(BOM)) == 0)
            {
                xmltext += 3;
            }

            VARIANT_BOOL bSuccess = spXMLDOM->loadXML(A2BSTR(xmltext));
            if ( bSuccess == VARIANT_TRUE) 
            {
                spXMLDOM->setProperty("SelectionNamespaces","xmlns:dns=\"http://www.w3.org/2005/Atom\"");

                return true;
            }
        }
        return false;

    }

I tried to debug still could not figure why sometimes loadXML() fails to load even well formed xmls. What am I doing wrong in the code. Any help is greatly appreciated.

Thanks JeeZ

+1  A: 

I'm not a fan of A2BSTR - at the very least you're leaking memory as the returned BSTR is never deallocated.

You could just as easily

      VARIANT_BOOL bSuccess = spXMLDOM->loadXML(CComBSTR(xmltext));

Which will handle the memory properly.

As to why its failing - You can ask the DOMDocument for its parseError object IXMLDOMParseError and then fetch the reason from it - that will probably shed more light on what the real problem is.

Ruddy
Thanks Ruddy for enlightening , I was not aware that A2BSTR leads to memory leak, will use CComBSTR from now on. Will do as you have mentioned and try to figure out the reason.
JeeZ
+1  A: 

For this specific issue, please refer to Strings Passed to loadXML must be UTF-16 Encoded BSTRs.

Overall, xml parser is not designed for in memory string parsing, e.g. loadXML does not recognize BOM, and it has restriction on the encoding. Rather, an xml parser is designed for byte array form with encoding detection, which is critical for a standard parser. To better leverage MSXML, please consider loading from IStream or a Win32 file.

Samuel Zhang
A: 

We use

hr = m_pXMLDoc->load(_variant_t(xml_file.c_str()), &varStatus);
hr = m_pXMLDoc->loadXML(_bstr_t(xml_doc.c_str()), &varStatus);

For loading files and raw xml respectively.

graham.reeds