tags:

views:

187

answers:

5

At my work we have our own XML classes which build a DOM, but I'm not sure how consecutive whitespace should be handled?

e.g.

<some-text>
Hello     World
</some-text>

When this is read into the DOM, should the text node include the consecutive whitespace inbetween Hello and World or just reduce it to one space?

Or should the XML source be written like this:

<some-text>
Hello &nbsp;&nbsp;&nbsp;&nbsp;World
</some-text>

or if not &nbsp; than perhaps &#32; ?

+2  A: 

EDIT: whitespace within tags is considered significant (my initial thoughts on this being like HTML were wrong; google first, answer questions later!) see this explanation

Steven A. Lowe
LOL, I know what you mean. I thought it would be just like HTML and experimented with how XHTML *displayed* and let that dictate the behaviour of our XML parser, only realising much later that it might be a bad assumption. Google(/Ask on SO) first, Implement later!
Ben Daniel
+1  A: 

IMO it seems quite natural to treat whitespaces as significant in this case. I would expect DOM node's value to be equal to what I used in markup.

aku
+3  A: 

&nbsp; is a HTML entity and nothing to do with XML itself.

To answer your question though, i would treat that as significant. Even the HTML DOM treats consecutive spaces as significant, it's just that it only visually renders one space. How it appears in the DOM and how it appears on your screen are two entirely different things.

nickf
Thanks, I'm inclined to agree.
Ben Daniel
Yep. Big red flag over "we have our own XML classes" though, you really don't want to be doing that unless you've spent a while studying the specifications and can really get it right. Surely there are existing standards-compliant DOM implementations you can grab?
bobince
+5  A: 

It's a bit disconcerting to hear that people are out there implementing XML processors without even looking at the XML specifications.

From the XML 1.0 specification, section 2.10 White Space Handling" (emphasis mine):

In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

An XML processor MUST always pass all characters in a document that are not markup through to the application. A validating XML processor MUST also inform the application which of these characters constitute white space appearing in element content.

A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. In valid documents, this attribute, like any other, MUST be declared if it is used. When declared, it MUST be given as an enumerated type whose values are one or both of "default" and "preserve". For example: ...

Wim Coenen
A: 

If you "have our own XML classes," and whoever wrote them doesn't already know the answer to this question, you probably have many, many, many more problems that you just haven't discovered yet.

Robert Rossney