views:

31

answers:

1

Does XHTML5 support character entities such as   and —. At work we can require specific software to access the admin side of the site, and people are demanding multi-file-upload. For me this is an easy justification to require migrating to FF 3.6+, so I'll be doing it soonish. We currently use XHTML 1.1, and upon moving to HTML5, I'm only having issues with character entity names... Does anyone have a doc on this?

I see there is a list on the WHATWG spec but I'm not sure if it affects files served as application/xhtml+xml. By any means the two mentioned trigger errors in both Chromium nightly and FF 3.6.

+2  A: 

There is no DTD for XHTML5, so an XML parser will see no entity definitions (other than the predefined ones). If you wanted to use an entity you would have to define it for yourself in the internal subset.

<!DOCTYPE html [
    <!ENTITY mdash "—">
]>
<html xmlns="http://www.w3.org/1999/xhtml"&gt;
    ... &mdash; ...
</html>

(Of course using the internal subset is likely to trip browsers up if you serve it to them as text/html. Sending an internal subset in a non-XHTML HTML5 document is disallowed.)

The HTML5 wiki currently recommends:

Do not use entity references in XHTML (except for the 5 predefined entities: &amp;, &lt;, &gt;, &quot; and &apos;)

And I agree with this advice not just for XHTML5 but for XML and HTML in general. There's little reason to be using the HTML entities for anything today. Unicode characters typed directly are far more readable for everyone, and &#...; character references are available for those sad cases when you can't guarantee a 8-bit/encoding-clean transport. (Since HTML entities are not defined for the majority of Unicode characters, you are going to need those anyway.)

bobince
how is `〹` more readable than `—`
Evan Carroll
If you want readability, just type a ‘—’ character. There's no use point trying to learn all the HTML entity names. Use the real character; paste it from the character map if you have to, but there are easier ways to input these characters if you do it a lot. (On my keyboard, shift-alt-minus produces it, for example.)
bobince
I upvoted that comment, because it is true, but what about " " How is that less readable than ` `
Evan Carroll
It would seem better if they would just formalize [these](http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html#named-character-references) into the internal HTML5 DTD, rather than leave it empty.
Evan Carroll
There is no HTML5 DTD, empty or otherwise, XML-based or otherwise! WHATWG took the position that DTD was an outmoded and insufficient schema language to describe HTML5. (And it is, it's bloody awful. The XML version is a bit more sane than the horrific SGML original, but still plenty nasty.) So HTML5 defines a new, non-SGML serialisation for plain-HTML that has many predefined entities. But for the XML serialisation XHTML5, no such strategy is possible as the only way to have an entity in XML is with a DTD (internal or external).
bobince
Which is why most XML users today never use entity references. Here's a more readable non-breaking space for you: ‘ ’. (Shift-space on my keyboard, FWIW!)
bobince
Right, unfortunately it looks no different to the eye reading the source.
Evan Carroll