views:

1797

answers:

4

How does one tell the XML parser to honor leading and trailing whitespace?

Dim xml: Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml>1 2</xml>"
wscript.echo len(xml.documentelement.text)

Above prints out 3.

Dim xml: Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml> 2</xml>"
wscript.echo len(xml.documentelement.text)

Above prints out 1. (I'd like it to print 2).

Is there something special I can put in the xml document itself to tell the parser to keep leading and trailing whitespace in the document?

CLARIFICATION 1: Is there an attribute that can be specificed ONCE at the beginning of the document to apply to all elements?

CLARIFICATION 2: Because the contents of the entities may have unicode data, but the xml file needs to be plain ascii, all entities are encoded -- meaning CDATA's unfortunately are not available.

+2  A: 

you could try putting it into a CDATA block:

<xml><![CDATA[ 2]]></xml>
Ferruccio
To all that answered with "xml:space": This problem has nothing to do with xml:space, which controls how a parser treats whitespace-*only* nodes. The nodes shown are definitely not whitespace-only.
Dimitre Novatchev
As per my prev. comment I recommend that you withdraw the incorrect answers or that other people completely downvote them. Cheers,
Dimitre Novatchev
The infoset is exactly the same, with or without CDATA. This is not the problem.
bortzmeyer
+3  A: 

As I commented, all answers recommending the usage of the xml:space="preserve" are wrong.

The xml:space attribute can only be used to control the treatment of whitespace-only nodes, that is text nodes composed entirely of whitespace characters.

This is not at all the case with the current problem.

In fact, the code provided below correctly obtains a length of 2 for the text node contained in:

<xml> 2</xml>

Here is the VB code that correctly gets the length of the text node (do not forget to add a reference to "Microsoft XML, v 3.0"):

Dim xml As MSXML2.DOMDocument
Private Sub Form_Load()
Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml> 2</xml>"
Dim n
n = Len(xml.documentelement.selectSingleNode("text()").nodeValue)
wscript.echo Len(n)
End Sub

If you put a breakpoint on the line:

wscript.echo Len(n)

you'll see that when the debugger breaks there, the value of n is 2, as it is required.

Therefore, this code is the solution that was being sought.

Dimitre Novatchev
the xml:space="preserve" attribute worked though. I don't know who deleted the answers that suggested it, but that worked fine for me.
Michael Pryor
@michaelpryor: More accurately, the answer to the orig. q. is: "No, nothing special needs be put in the XML document as the parser does not trim any non-white-space text node. Simply use the "nodeValue" property and do not use the "text" property.
Dimitre Novatchev
+1  A: 

As mentioned by Dimitre Novatchev, for XML, whitespace is not deleted at will by the parser. The white space is part if the node's value. Since I do not speak Visual Basic, here is a C program with libxml which prints the length of the first text node. There is absolutely no need to set xml:space.

% ./whitespace "<foo> </foo>"
Length of " " is 1

% ./whitespace "<foo> 2</foo>"
Length of " 2" is 2

% ./whitespace "<foo>1 2</foo>" 
Length of "1 2" is 3

Here is the program:

#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>

int
main(int argc, char **argv)
{
    char           *xml;
    xmlDoc         *doc;
    xmlNode        *first_child, *node;
    if (argc < 2) {
        fprintf(stderr, "Usage: %s XML-string\n", argv[0]);
        return 1;
    }
    xml = argv[1];
    doc = xmlReadMemory(xml, strlen(xml), "my data", NULL, 0);
    first_child = doc->children;
    first_child = first_child->children;        /* Skip the root */
    for (node = first_child; node; node = node->next) {
        if (node->type == XML_TEXT_NODE) {
            fprintf(stdout, "Length of \"%s\" is %i\n", (char *) node->content,
                    strlen((char *) node->content));
        }
    }
    return 0;
}
bortzmeyer
A: 

Hello,

I have a problem with leading spaces that occurs in xml files. I am trying to write a code that produces an xml file. However, I dont want leading spaces when I use root element and member element method. After a root element, member elements starts from 2 space character inside. Do we have a solution for that? I mean I want to have all lines aligned to the most left side of the page.

Thanks

Mumin Raif
Try asking as a new question (and post your code samples), instead of posting an answer to this existing question.
Michael Pryor