I am reading a document about HTML5. A few lines down from where I linked, a sample DOM tree is displayed for the sample HTML code given. Why is there no text node directly before the <head>
element? Why is there no text node between the DOCTYPE
and <html>
nodes? Error or feature?
views:
27answers:
2
A:
The text node before the <head>
is probably an omission. You don't get a text node before the root element because most XML/HTML parsers can't deal with elements outside the root node, so they silently ignore them. The same happens if you add a comment or a processing instruction there.
Aaron Digulla
2010-07-12 09:56:06
I don't have a specific example, but I'd imagine some environments are "smart" enough to filter out "irrelevant" text, i.e. non-significant white space. If a text element contains nothing but insignificant white space, it could be legitimate for an implementation to strip it out. As a developer, I'm usually grateful when this happens.
Carl Smotricz
2010-07-12 10:07:41
In which way could the "CR" between "<html>" and "<head>" be more or less significant than between "</head>" and "<body>"?
Aaron Digulla
2010-07-12 11:35:23
I have to agree with Aaron. While the CR between <html> and <head> is rather useless for me, it's no less important that some other CR's. I like my environments to be smart enough to obey the standard exactly, if possible. In fact, I remember being very annoyed by the differences in text nodes between IE and FF. That said, I've posted a comment on that page at http://www.w3.org/Bugs/Public/show_bug.cgi?id=10136 for clarification. Thanks for the responses.
TNi
2010-07-12 18:28:47
+2
A:
Feature. The main reason is that, given the markup
<!DOCTYPE html>
<html>
<head>
<title>Sample page</title>
...,
some people expect
document.documentElement.firstChild
to return the head
element. However, if the text node were included, that is the node that would be returned.
(Note, also, that the new line between </body>
and </html>
ends up in the body
element.)
Ms2ger
2010-07-12 14:27:46
I like this answer a lot, because you also noticed the extra CR after <body>. I did too, but wasn't sure what to think at the time. So, it absolutely could be a decision (undocumented?) made for legacy reasons. At the same time, the group is willing to make other changes to the standard on the basis of making things less confusing. Very odd. I'm going to let the question stand for a bit, just so I can figure out exactly what is going on. If you're interested, I've also requested clarification at http://www.w3.org/Bugs/Public/show_bug.cgi?id=10136
TNi
2010-07-12 18:34:57
The editor is willing to change the specification to make things less confusion, but only if web pages don't rely on the particular behavior, and if browser vendors are willing to implement the change. That wasn't the case here.
Ms2ger
2010-07-13 16:51:05