views:

219

answers:

2

Is there any information on how to correctly handle white spaces in XHTML (1.0 Transitional)? It seems as if XHTML does not use standard XML white space handling.

Edit: Mayby I was a bit unprecise about what I was exactly looking for. I'm more interested in how an element gets rendered than how it would be processed by an XML processor. For example the following will render with 1 white space inbetween:

<em> em content </em> following text

The situation gets more complicated if the space actually has its own formatting, for example <a href="http://www.google.de"&gt; content of the hyperlink </a> content after the hyperlink will have an underlined space at the end of the hyperlink, while <a href="http://www.google.de"&gt; content of the hyperlink</a> content after the hyperlink<br /> will not underline the the space.

It seems as if the space is always appended to the previous formatting scope and white spaces are handled over (inline) element begin and end tags. But this is based solely on testing and I was wondering if there is some kind of specification on how excatly this behaves.

+2  A: 

From the W3C Recommendation:

4.7. White Space handling in attribute values

When user agents process attributes, they do so according to Section 3.3.3 of [XML]:

  • Strip leading and trailing white space.
  • Map sequences of one or more white space characters (including line breaks) to a single inter-word space.

For whitespace in between tags, see the section 3.2 criteria 9:

3.2. User Agent Conformance

[1-8 snipped]

9. White space is handled according to the following rules. The following characters are defined in [XML] white space characters:

  • SPACE (&#x0020;)
  • HORIZONTAL TABULATION (&#x0009;)
  • CARRIAGE RETURN (&#x000D;)
  • LINE FEED (&#x000A;)

The XML processor normalizes different systems' line end codes into one single LINE FEED character, that is passed up to the application.

The user agent must use the definition from CSS for processing whitespace characters [CSS2]. Note that the CSS2 recommendation does not explicitly address the issue of whitespace handling in non-Latin character sets. This will be addressed in a future version of CSS, at which time this reference will be updated.

Also see section C.15:

C.15. White Space Characters in HTML vs. XML

Some characters that are legal in HTML documents, are illegal in XML document. For example, in HTML, the Formfeed character (U+000C) is treated as white space, in XHTML, due to XML's definition of characters, it is illegal.

Bradley Mountford
Does this also affect spaces between tags? This seems to be limited to attribute values.
JPW
Good question! See my edits above.
Bradley Mountford
Thank you for the answer (and for the effort you've put into it), but this not quite what I was looking for (see my edits in the original question)
JPW
A: 

It seems that there is no real documentation on how white spaces are rendered in XHTML. Here is what I found out by experiment:

  1. White spaces are reduced into a single space even over begin and end tags within the same block
  2. The space will be put into the formatting scope of the containing tag. If it spans two tags it will be added to the first tag.
  3. Spaces at the begin and end of block elements or span elements which are the first child element/ the last child element in their block are ignored.
  4. White spaces outside of block elements are ignored.

This is all I could figure out. It is kind of sad that the XHTML specifiaction does not contain information about rendering of white spaces.

JPW