tags:

views:

973

answers:

5

All,

this is my code

//declare string pointer
BSTR markup;

//initialize markup to some well formed XML <-

//declare and initialize XML Document
MSXML2::IXMLDOMDocument2Ptr pXMLDoc;
HRESULT hr;
hr = pXMLDoc.CreateInstance(__uuidof(MSXML2::DOMDocument40));
pXMLDoc->async = VARIANT_FALSE;
pXMLDoc->validateOnParse = VARIANT_TRUE;
pXMLDoc->preserveWhiteSpace = VARIANT_TRUE;    

//load markup into XML document
vtBoolResult = pXMLDoc->loadXML(markup);

//do some changes to the XML file<-

//get back string from XML doc
markup = pXMLDoc->Getxml(); //<-- this retrieves RUBBISH

At this point my string is mangled (just a few chinese characters at the start then rubbish) . Looks like an encoding issue.

I also tried the following:

_bstr_t superMarkup = _bstr_t(markup);

//did my stuff

superMarkup = pXMLDoc->Getxml();

markup = superMarkup;

but still I am getting the same result.

Even if I call GetXML() without changing anything in the xml document I still get rubbish.

At this point if I try to assign the mangled pointer to another pointer it will trow an error:

Attempted to restore write protected memory. this is often an indication that other memory is corrupted.

Any suggestion?

EDIT1:

I found out this is happening in relation to the size of the XML string. If it happens on a given XML string and I reduce the size (keeping the same schema) it will work fine. Looks like MSXML2::DOMDocument40 has a limitation on size? In detail it happens if I have more than 16407 characters. I have one more GetXML will retrieve RUBBISH - if it's <= 16407 everything works fine.

EDIT2:

Roddy was right - I was missing that _bstr_t is a class ...

Rings any bell?

Cheers

A: 

What string is getting messed up? You haven't initialised markup to anything...

Patrick
some generic well-formed XML (just edited original post)
JohnIdol
+3  A: 

Try replacing

 BSTR Markup;

with

 bstr_t Markup;

BSTR is pretty much a dumb pointer, and I think that the return result of GetXML() is being converted to a temporary which is then destroyed by the time you get to see it. bstr_t wraps that with some smart-pointer goodness...

Note: Your "SuperMarkup" thing did NOT do what I suggested. Again, BSTR is just a pointer, and doesn't "own" what it points to. bstr_t, on the other hand does. I think your GetXML() function is returning a bstr_t, which is then being deleted as it goes out of scope, leaving your BSTR pointing to memory that is no longer valid.

Roddy
I tried this:bstr_t superMarkup = bstr_t(markup);then used superMarkup instead of markup in the LoadXML call.Nothing changed!
JohnIdol
Have 2 variables, a bstr_t initialXML and a bstr_t returnedXML. Only use each variable for one purpose.
Patrick
See edit - looks like it depends on size
JohnIdol
If you don't fully understand BSTR and bstr_t (and thus, who - if anyone - owns the memory containing the result from GetXML), it will probably depend on the phase of the moon as well. I can't offer any more help than that.
Roddy
Even if I understand BSTR bstr_t and all you want it still doesn't make sense that it works fine if the size is below exactly 16407
JohnIdol
Anyway I did try replacing BSTR with bstr_t and it works fine IF the size is below that magic number there, exactly same as my original code
JohnIdol
+1  A: 

Ok, I think Patrick is right. I took your code and made a quick ATL EXE project named getxmltest. I added this line after #include directives

#import "MSXML3.DLL"

removed the post-build event which registers the component because I dont want to expose any component from the exe but only have all ATL headers and libs already referenced and added the following code to the _tWinMain

extern "C" int WINAPI _tWinMain(HINSTANCE /*hInstance*/, HINSTANCE /*hPrevInstance*/, 
                                LPTSTR /*lpCmdLine*/, int nShowCmd)
{
    CoInitialize(NULL);
    {
     //declare string pointer
     _bstr_t      markup;
     //initialize markup to some well formed XML <-
     //declare and initialize XML Document
     MSXML2::IXMLDOMDocument2Ptr pXMLDoc;
     HRESULT      hr    = pXMLDoc.CreateInstance(__uuidof(MSXML2::DOMDocument));

     pXMLDoc->async    = VARIANT_FALSE;
     pXMLDoc->validateOnParse = VARIANT_TRUE;
     pXMLDoc->preserveWhiteSpace = VARIANT_TRUE;    

     //load markup into XML document
     VARIANT_BOOL    vtBoolResult = pXMLDoc->loadXML(L"<XML></XML>");

     //do some changes to the XML file<-
     //get back string from XML doc
     markup = pXMLDoc->Getxml(); //<-- this retrieves RUBBISH (not anymore...)
     ATLTRACE("%S", (BSTR)markup.GetBSTR());
    }
    CoUninitialize();
    return _AtlModule.WinMain(nShowCmd);
}

The resulting trace lines were the following...

'getxmltest.exe': Loaded 'C:\Windows\winsxs\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.6001.18000_none_5cdbaa5a083979cc\comctl32.dll'
<XML></XML>
'getxmltest.exe': Unloaded 'C:\Windows\SysWOW64\msxml3.dll'
The program '[6040] getxmltest.exe: Native' has exited with code 0 (0x0).

Where we can see the string we entered initially.. I didnt add any logic to the code because I though this was enough to display the resulting xml after processing it with the MSXML engine. Obviously you may do some more testing using this code and see what happens next.

Eugenio Miró
can you reproduce my problem if size > 16407 (see edit in OP)
JohnIdol
I added a couple of lines to my code to add elements dynamically to the xml obtaining an xml with a lenght of 160013 bytes with no problem.I'd post the final code in a new response below
Eugenio Miró
+1  A: 

I'm not proficient with this particular xml library, however:

Something to note here is the original question overwrote the variable 'markup' as it retrieved the result. Many XML parsers return pointers to the initial input (i.e markup), so when you replace it with the output, you also delete the input to the XML parser.

It seems possible that this process would invalidate the string that you just received. You will notice that Eugenio Miró does not make this mistake in his example, as he allocates a different variable to hold the input (pXMLDoc).

A quick test you might like to do is to change

//get back string from XML doc
markup = pXMLDoc->Getxml(); //<-- this retrieves RUBBISH

to

//get back string from XML doc
BSTR output = pXMLDoc->Getxml(); //<-- perhaps this doesn't

and see if that makes a difference.

Tom Leys
thanks for pointing this out but I do not understand your consideration about pXMLDoc!
JohnIdol
A: 

This is the code I wrote before with a little modification which adds 20000 'child' elements :) and it works well.

extern "C" int WINAPI _tWinMain(HINSTANCE /*hInstance*/, HINSTANCE /*hPrevInstance*/, 
                                LPTSTR /*lpCmdLine*/, int nShowCmd)
{
    CoInitialize(NULL);
    {
        //declare string pointer
        _bstr_t                           markup;
        //initialize markup to some well formed XML <-
        //declare and initialize XML Document
        try {
            MSXML2::IXMLDOMDocument2Ptr   pXMLDoc;
            HRESULT                       hr              = pXMLDoc.CreateInstance(__uuidof(MSXML2::DOMDocument));    
            pXMLDoc->async                = VARIANT_FALSE;
            pXMLDoc->validateOnParse      = VARIANT_TRUE;
            pXMLDoc->preserveWhiteSpace   = VARIANT_TRUE;    

            //load markup into XML document
            VARIANT_BOOL                  vtBoolResult    = pXMLDoc->loadXML(L"<XML></XML>");

            for (int i = 0; i < 20000; i++) {
                MSXML2::IXMLDOMNodePtr    node            = pXMLDoc->createNode(_variant_t("element"), _bstr_t("child"), _bstr_t(""));

                if (node)
                    pXMLDoc->documentElement->appendChild(node);
            }

            //do some changes to the XML file<-
            //get back string from XML doc
            markup = pXMLDoc->Getxml(); //<-- th
            ATLTRACE("XML lenght = %d, xml=%S\n", markup.length(), (BSTR)markup.GetBSTR());
        } catch(_com_error e) {
            ATLTRACE("error = %S\n", (BSTR)e.ErrorMessage());
        }
    }
    CoUninitialize();
    return _AtlModule.WinMain(nShowCmd);
}

this produces a 1024 output line however in the debugger but this could easely print the xml to stdoutput if you wish. This is the output I get so far

'getxmltest.exe': Loaded 'C:\Windows\winsxs\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.6001.18000_none_5cdbaa5a083979cc\comctl32.dll'
XML lenght = 160013, xml=<XML><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><child/><'getxmltest.exe': Unloaded 'C:\Windows\SysWOW64\msxml3.dll'
The program '[4884] getxmltest.exe: Native' has exited with code 0 (0x0).
Eugenio Miró