views:

138

answers:

1

I'm using a variant on code seen in "How to make XMLDOMDocument include the XML Declaration?" (which can also be seen at MSDN. If I change the encoding to "UTF-16" one would think it would output as UTF-16... and it "does"... by looking at the output in a text editor; but checking it in a hex editor, the byte-order mark is missing (despite the property being set to true), and XML editors reject the document as invalid UTF-16, for the missing BOM.

What am I overlooking?

'' # Create and load a DOMDocument object.

Dim xmlDoc As New DOMDocument60
xmlDoc.loadXML("<doc><one>test1</one><two>test2</two></doc>")

'' # Set properties on the XML writer - including BOM, XML declaration and encoding

Dim wrt As New MXXMLWriter60
wrt.byteOrderMark = True
wrt.omitXMLDeclaration = False
wrt.encoding = "UTF-16"
wrt.indent = False

'' # Set the XML writer to the SAX content handler.

Dim rdr As New SAXXMLReader60
Set rdr.contentHandler = wrt
Set rdr.dtdHandler = wrt
Set rdr.errorHandler = wrt
rdr.putProperty "http://xml.org/sax/properties/lexical-handler", wrt
rdr.putProperty "http://xml.org/sax/properties/declaration-handler", wrt

'' # Now pass the DOM through the SAX handler, and it will call the writer

rdr.parse xmlDoc

'' # Let the writer do its thing

Dim iFileNo As Integer
iFileNo = FreeFile
Open App.Path + "\saved.xml" For Output As #iFileNo
Print #iFileNo, wrt.output
Close #iFileNo

The output looks like:

<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<doc><one>test1</one><two>test2</two></doc>


Why am I using VB6? It's actually in VBA (same generation, slight subset of VB6), used as the scripting-language for EMC-Captiva's InputAccel/FormWare, so switching is not an option.

+2  A: 

The problem is that when you retrieve a value from the writer's output property you will get a string. Since strings in VB are always UTF-16 thats what you get regardless of the encoding. Since strings are always UTF-16 in VB there is no notion of them needing a BOM so that isn't included either.

The encoding and the BOM properties only affect how the writer will write the XML when an implementation of IStream is assigned to the output property.

Try modifying you code around the call to parse as follows:-

Dim oStream As ADODB.Stream
Set oStream =  New ADODB.Stream
oStream.Open
oStream.Type = adTypeBinary

wrt.output = oStream

rdr.parse xmlDoc

oStream.SaveToFile App.Path + "\saved.xml"
oStream.Close

This should generate the desired output.

AnthonyWJones
Confirmed, I got the same result through another way. It is important to set the output property *first*, before the other properties or you still won't get the BOM.
Hans Passant
@nobugz: Yes good catch, assigning to the output property needs to happen before assigning the other properties.
AnthonyWJones
"oStream.Type = adTypeBinary" in my VBA, but that did the trick. Thanks!
Michael Paulukonis
@OtherMichael: Tweaked my code accordingly I admit guessed at that constant, I don't have VB6 installed on the machine I'm currently using, I tested in VBScript.
AnthonyWJones