views:

27

answers:

2

I am reading a document about HTML5. A few lines down from where I linked, a sample DOM tree is displayed for the sample HTML code given. Why is there no text node directly before the <head> element? Why is there no text node between the DOCTYPE and <html> nodes? Error or feature?

A: 

The text node before the <head> is probably an omission. You don't get a text node before the root element because most XML/HTML parsers can't deal with elements outside the root node, so they silently ignore them. The same happens if you add a comment or a processing instruction there.

Aaron Digulla
I don't have a specific example, but I'd imagine some environments are "smart" enough to filter out "irrelevant" text, i.e. non-significant white space. If a text element contains nothing but insignificant white space, it could be legitimate for an implementation to strip it out. As a developer, I'm usually grateful when this happens.
Carl Smotricz
In which way could the "CR" between "<html>" and "<head>" be more or less significant than between "</head>" and "<body>"?
Aaron Digulla
I have to agree with Aaron. While the CR between <html> and <head> is rather useless for me, it's no less important that some other CR's. I like my environments to be smart enough to obey the standard exactly, if possible. In fact, I remember being very annoyed by the differences in text nodes between IE and FF. That said, I've posted a comment on that page at http://www.w3.org/Bugs/Public/show_bug.cgi?id=10136 for clarification. Thanks for the responses.
TNi
+2  A: 

Feature. The main reason is that, given the markup

<!DOCTYPE html>
<html>
 <head>
  <title>Sample page</title>
...,

some people expect

document.documentElement.firstChild

to return the head element. However, if the text node were included, that is the node that would be returned.

(Note, also, that the new line between </body> and </html> ends up in the body element.)

Ms2ger
I like this answer a lot, because you also noticed the extra CR after <body>. I did too, but wasn't sure what to think at the time. So, it absolutely could be a decision (undocumented?) made for legacy reasons. At the same time, the group is willing to make other changes to the standard on the basis of making things less confusing. Very odd. I'm going to let the question stand for a bit, just so I can figure out exactly what is going on. If you're interested, I've also requested clarification at http://www.w3.org/Bugs/Public/show_bug.cgi?id=10136
TNi
The editor is willing to change the specification to make things less confusion, but only if web pages don't rely on the particular behavior, and if browser vendors are willing to implement the change. That wasn't the case here.
Ms2ger