views:

691

answers:

2

I wrote a simple package installer in WinBatch that needs to update an XML file with information about the package contents. My first stab at it involved loading the file with Msxml2.DOMDocument, adding nodes and data as required, then saving the data back to disk. This worked well enough, except that it would not create tab and CR/LF whitespace in the new data. The solution I came up with was writing an XSL stylesheet that would recreate the XML file with whitespace added back in. I'm doing this by:

  1. loading the XSL file into an Msxml2.FreeThreadedDOMDocument object
  2. setting that object as the stylesheet property of an Msxml2.XSLTemplate object
  3. creating an XSL processor via Msxml2.XSLTemplate.createProcessor()
  4. setting my original Msxml2.DOMDocument as the input property of the XSL processor
  5. Calling transform() method of the XSL processor, and saving the output to a file.

This works as for as reformatting the XML file with tabs and carriage returns, but my XML declaration comes out either as <?xml version="1.0"?> or <?xml version="1.0" encoding="UTF-16"?> depending on whether I used Msxml2.*.6.0 or Msxml2.* objects (a fall back if the system doesn't have 6.0).

If the encoding is set to UTF-16, Msxml12.DOMDocument complains about trying to convert UTF-16 to 1-byte encoding the next time I run my package installer. I've tried creating and adding an XML declaration using both createProcessingInstruction() to both the XML and XSL DOM objects, but neither one seems to affect the output of the XSLTemplate processor. I've also set encoding to UTF-8 in the <xsl:output/> tag in my XSL file.

Here is the relevant code in my Winbatch script:

    xmlDoc = ObjectCreate("Msxml2.DOMDocument.6.0")
    if !xmlDoc then xmlDoc = ObjectCreate("Msxml2.DOMDocument")

    xmlDoc.async = @FALSE
    xmlDoc.validateOnParse = @TRUE
    xmlDoc.resolveExternals = @TRUE
    xmlDoc.preserveWhiteSpace = @TRUE
    xmlDoc.setProperty("SelectionLanguge", "XPath")
    xmlDoc.setProperty("SelectionNamespaces", "xmlns:fns='http://www.abc.com/f_namespace'")
    xmlDoc.load(xml_file_path)

    xslStyleSheet = ObjectCreate("Msxml2.FreeThreadedDOMDocument.6.0")
    if !xslStyleSheet then xslStyleSheet = ObjectCreate("Msxml2.FreeThreadedDOMDocument")

    xslStyleSheet.async = @FALSE
    xslStyleSheet.validateOnParse = @TRUE
    xslStyleSheet.load(xsl_style_sheet_path)

    xslTemplate = ObjectCreate("Msxml2.XSLTemplate.6.0")
    if !xslTemplate then xslTemplate = ObjectCreate("Msxml2.XSLTemplate")

    xslTemplate.stylesheet = xslStyleSheet

    processor = xslTemplate.createProcessor()
    processor.input = xmlDoc
    processor.transform()

    ; create a new file and write the XML processor output to it
    fh = FileOpen(output_file_path, "WRITE" , @FALSE)
    FileWrite(fh, processor.output)
    FileClose(fh)

The style sheet, with some slight changes to protect the innocent:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.1">
    <xsl:output method="xml" indent="yes" encoding="UTF-8"/>
    <xsl:template match="/">
        <fns:test_station xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:fns="http://www.abc.com/f_namespace"&gt;
            <xsl:for-each select="/fns:test_station/identification">
                <xsl:text>&#x0A;    </xsl:text>
                <identification>
                    <xsl:for-each select="./*">
                        <xsl:text>&#x0A;        </xsl:text>
                        <xsl:copy-of select="."/>
                    </xsl:for-each>
                    <xsl:text>&#x0A;    </xsl:text>
                </identification>
            </xsl:for-each>
            <xsl:for-each select="/fns:test_station/software">
                <xsl:text>&#x0A;    </xsl:text>
                <software>
                    <xsl:for-each select="./package">
                        <xsl:text>&#x0A;        </xsl:text>
                        <package>
                            <xsl:for-each select="./*">
                                <xsl:text>&#x0A;            </xsl:text>
                                <xsl:copy-of select="."/>
                            </xsl:for-each>
                            <xsl:text>&#x0A;        </xsl:text>
                        </package>
                    </xsl:for-each>
                    <xsl:text>&#x0A;    </xsl:text>
                </software>
            </xsl:for-each>
            <xsl:for-each select="/fns:test_station/calibration">
                <xsl:text>&#x0A;    </xsl:text>
                <calibration>
                    <xsl:for-each select="./item">
                        <xsl:text>&#x0A;        </xsl:text>
                        <item>
                            <xsl:for-each select="./*">
                                <xsl:text>&#x0A;            </xsl:text>
                                <xsl:copy-of select="."/>
                            </xsl:for-each>
                        <xsl:text>&#x0A;        </xsl:text>
                        </item>
                    </xsl:for-each>
                    <xsl:text>&#x0A;    </xsl:text>
                </calibration>
            </xsl:for-each>
        </fns:test_station>
    </xsl:template>
</xsl:stylesheet>

And this is a sample output file:

<?xml version="1.0" encoding="UTF-16"?>
<fns:test_station xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:fns="http://www.abc.com/f_namespace"&gt;
    <software>
        <package>
            <part_number>123456789</part_number>
            <version>00</version>
            <test_category>1</test_category>
            <description>name of software package</description>
            <execution_path>c:\program files\test\test.exe</execution_path>
            <execution_arguments>arguments</execution_arguments>
            <crc_path>c:\ste_config\crc\123456789.lst</crc_path>
            <uninstall_path>c:\ste_config\uninstall\uninst_123456789.bat</uninstall_path>
            <install_timestamp>2009-11-09T14:00:44</install_timestamp>
        </package>
    </software>
</fns:test_station>
A: 

The problem is that the output of the transform() method of the XSLT processor is being serialised as a string when you access the output property (either directly or indirectly), and Windows uses UTF-16 encoding for strings. The MSDN documentation of the output property mentions this almost in passing at the foot of the page:

In this case, the output is always generated in the Unicode encoding, and the encoding attribute on the element is ignored.

(where they mean UTF-16 when they say "the Unicode encoding".)

If you use transformNodeToObject, specifying a new DOMDocument object as the output, then you can save the serialisation of the UTF-8 encoded content from that.

Better still for your case, if you have an object implementing the IStream interface such as the stream associated with the file you're trying to save, you can pass that to transformNodeToObject to send the UTF-8 output directly to disk. (I can't remember if you have to open and close the file manually in this case, so you'll have to experiment with that.)

NickFitz
I tried this to create a different output file for comparison, after the script code in my original question:`xmlDoc.transformNode(xslStyleSheet)``xmlDoc.save('c:\ste_config\output2.xml')`That causes `<package>...</software>` to be all on one line despite the explicit whitespace in my XSL file.
aspiehler
`<xsl:output indent="yes" encoding="UTF-8">` (and whatever other things you need on `xsl:output`) should fix that: http://www.w3.org/TR/1999/REC-xslt-19991116#output You may also want to check the section of the spec on whitespace: http://www.w3.org/TR/1999/REC-xslt-19991116#strip and the XSLT FAQ section on whitespace: http://www.dpawson.co.uk/xsl/sect2/N8321.html
NickFitz
`<xsl:output method="xml" indent="yes" encoding="UTF-8"/>` is already in my XSL file, but the encoding directive gets ignored by the `Msxml2.XSLProcessor`. I tried `transformNode` and doing that outputs the encoding I specified, but doesn't alter the whitespace of the input XML.
aspiehler
A: 

You could try using ADODB.Stream to save it in the UTF-8 encoding.

While I don't have Winbatch, extrapolating from VBScript something like the following would work:

Set oStream = ObjectCreate("ADODB.Stream")
oStream.Open
oStream.Charset = "UTF-8"

processor.Output = oStream
processor.Transform

oStream.SaveToFile(output_file_path)
oStream.Close
Carlos da Costa