views:

2901

answers:

4

I'm writing out XML files using the MSXML parser, with a wrapper I downloaded from here: http://www.codeproject.com/KB/XML/JW_CXml.aspx. Works great except that when I create a new document from code (so not load from file and modify), the result is all in one big line. I'd like elements to be indented nicely so that I can read it easily in a text editor.

Googling shows many people with the same question - asked around 2001 or so. Replies usually say 'apply an XSL transformation' or 'add your own whitespace nodes'. Especially the last one makes me go %( so I'm hoping that in 2008 there's an easier way to pretty MSXML output. So my question; is there, and how do I use it?

A: 

Unless the library has a format option then the only other way is to use XSLT, or an external pretty printer ( I think htmltidy can also do xml) There doen't seem to be an option in the codeproject lib but you can specify an XSLT stylesheet to MSXML.

Martin Beckett
What a bummer. There is indeed no option in the codeproject lib, it's just a barebones wrapper around MSXML. Is there an easy way to apply the stylesheet (without writing the xml out to disk and formatting it there, then reading it back in)? What would the stylesheet look like?
Roel
+1  A: 

Try this, I found this years ago on the web.

#include <msxml2.h>

bool FormatDOMDocument (IXMLDOMDocument *pDoc, IStream *pStream)
{

    // Create the writer

    CComPtr <IMXWriter> pMXWriter;
    if (FAILED (pMXWriter.CoCreateInstance(__uuidof (MXXMLWriter), NULL, CLSCTX_ALL)))
    {
        return false;
    }
    CComPtr <ISAXContentHandler> pISAXContentHandler;
    if (FAILED (pMXWriter.QueryInterface(&pISAXContentHandler)))
    {
        return false;
    }
    CComPtr <ISAXErrorHandler> pISAXErrorHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXErrorHandler)))
    {
        return false;
    }
    CComPtr <ISAXDTDHandler> pISAXDTDHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXDTDHandler)))
    {
        return false;
    }

    if (FAILED (pMXWriter ->put_omitXMLDeclaration (VARIANT_FALSE)) ||
        FAILED (pMXWriter ->put_standalone (VARIANT_TRUE)) ||
        FAILED (pMXWriter ->put_indent (VARIANT_TRUE)) ||
        FAILED (pMXWriter ->put_encoding (L"UTF-8")))
    {
        return false;
    }

    // Create the SAX reader

    CComPtr <ISAXXMLReader> pSAXReader;
    if (FAILED (pSAXReader.CoCreateInstance (__uuidof (SAXXMLReader), NULL, CLSCTX_ALL)))
    {
        return false;
    }

    if (FAILED (pSAXReader ->putContentHandler (pISAXContentHandler)) ||
        FAILED (pSAXReader ->putDTDHandler (pISAXDTDHandler)) ||
        FAILED (pSAXReader ->putErrorHandler (pISAXErrorHandler)) ||
        FAILED (pSAXReader ->putProperty (
        L"http://xml.org/sax/properties/lexical-handler", CComVariant (pMXWriter))) ||
        FAILED (pSAXReader ->putProperty (
        L"http://xml.org/sax/properties/declaration-handler", CComVariant (pMXWriter))))
    {
        return false;
    }

    // Perform the write

    return 
       SUCCEEDED (pMXWriter ->put_output (CComVariant (pStream))) &&
       SUCCEEDED (pSAXReader ->parse (CComVariant (pDoc)));
}
Torlack
Awesome, worked great. I made some modifications to do this in-memory, I'll post those in a separate post because it won't fit in a comment.
Roel
I really wish I could find the original source myself. This saved my butt :)
Torlack
I did find a similar one on http://discuss.com.cn/xml/86274.html, that's where I got the '.GetInterfacePtr()' part from. I don't speak Chinese though so I don't understand most of the page :)
Roel
+3  A: 

Here's a modified version of the accepted answer that will transform in-memory (changes only in the last few lines but I'm posting the whole block for the convenience of future readers):

bool CXml::FormatDOMDocument(IXMLDOMDocument *pDoc)
{
    // Create the writer
    CComPtr <IMXWriter> pMXWriter;
    if (FAILED (pMXWriter.CoCreateInstance(__uuidof (MXXMLWriter), NULL, CLSCTX_ALL))) {
        return false;
    }
    CComPtr <ISAXContentHandler> pISAXContentHandler;
    if (FAILED (pMXWriter.QueryInterface(&pISAXContentHandler))) {
        return false;
    }
    CComPtr <ISAXErrorHandler> pISAXErrorHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXErrorHandler))) {
        return false;
    }
    CComPtr <ISAXDTDHandler> pISAXDTDHandler;
    if (FAILED (pMXWriter.QueryInterface (&pISAXDTDHandler))) {
        return false;
    }

    if (FAILED (pMXWriter->put_omitXMLDeclaration (VARIANT_FALSE)) ||
        FAILED (pMXWriter->put_standalone (VARIANT_TRUE)) ||
        FAILED (pMXWriter->put_indent (VARIANT_TRUE)) ||
        FAILED (pMXWriter->put_encoding (L"UTF-8")))
    {
        return false;
    }

    // Create the SAX reader
    CComPtr <ISAXXMLReader> pSAXReader;
    if (FAILED(pSAXReader.CoCreateInstance(__uuidof (SAXXMLReader), NULL, CLSCTX_ALL))) {
        return false;
    }

    if (FAILED(pSAXReader->putContentHandler (pISAXContentHandler)) ||
        FAILED(pSAXReader->putDTDHandler (pISAXDTDHandler)) ||
        FAILED(pSAXReader->putErrorHandler (pISAXErrorHandler)) ||
        FAILED(pSAXReader->putProperty (L"http://xml.org/sax/properties/lexical-handler", CComVariant (pMXWriter))) ||
        FAILED(pSAXReader->putProperty (L"http://xml.org/sax/properties/declaration-handler", CComVariant (pMXWriter))))
    {
        return false;
    }

    // Perform the write
 bool success1 = SUCCEEDED(pMXWriter->put_output(CComVariant(pDoc.GetInterfacePtr())));
 bool success2 = SUCCEEDED(pSAXReader->parse(CComVariant(pDoc.GetInterfacePtr())));

 return success1 && success2;
}
Roel
A: 

I've written a sed script a while back for basic xml indenting. You can use it as an external indenter if all else fails (save this to xmlindent.sed, and process your xml with sed -f xmlindent.sed <filename>). You might need cygwin or some other posix environment to use it though.

Here's the source:

:a
/>/!N;s/\n/ /;ta
s/  / /g;s/^ *//;s/  */ /g
/^<!--/{
:e
/-->/!N;s/\n//;te
s/-->/\n/;D;
}
/^<[?!][^>]*>/{
H;x;s/\n//;s/>.*$/>/;p;bb
}
/^<\/[^>]*>/{
H;x;s/\n//;s/>.*$/>/;s/^    //;p;bb
}
/^<[^>]*\/>/{
H;x;s/\n//;s/>.*$/>/;p;bb
}
/^<[^>]*[^\/]>/{
H;x;s/\n//;s/>.*$/>/;p;s/^/ /;bb
}
/</!ba
{
H;x;s/\n//;s/ *<.*$//;p;s/[^    ].*$//;x;s/^[^<]*//;ba
}
:b
{
s/[^    ].*$//;x;s/^<[^>]*>//;ba
}

Hrmp, tabs seem to be garbled... You can copy-waste from here instead: XML indenting with sed(1)

mitchnull