- <, & and " in attribute values where
"
is the delimiter: use <
, &
and "
, respectively.
These are predefined entities in XML so will work with any parser regardless of whether it reads the document type. They are also normal defined entities in HTML.
Numeric character references are just as valid, but slightly harder to read.
- > in text content: use
>
or leave as -is.
>
doesn't normally need escaping, it's perfectly legal in an attribute value at all times, and it's legal in text content as long as it doesn't form part of a ]]>
sequence. (This is an obscure, pointless and sometimes-ignored part of the XML spec.) You might prefer to always escape it in text content anyway, just to be safe and not have to remember this rule. (That's what Canonical XML does.)
Numeric character references are just as valid, but slightly harder to read.
- ' in attribute values where
'
is the delimiter: use '
.
The numeric character reference is most correct here, because the XML predefined entity '
isn't technically defined by the HTML4 standard (even though it will work in all current browsers). The lateness of adding this entity reflects the common practice of always using "
as the attribute value delimiter.
- non-ASCII characters: include as-is
As long as you're using and declaring UTF-8 you can just spit the characters straight out. Smaller, more readable results.
- non-ASCII characters (without Unicode): use numeric character reference
If for some reason you can't use UTF-8 (boooo!!!), use a character reference like é
in preference to the HTML entities. The HTML entities only cover a very small portion of the Unicode character set anyway; might as well use them for all IMO. I personally prefer to use the &#x...
hex-escapes for the non-ASCII characters as it is traditional to refer to Unicode characters by their ‘U+xxxx’ hex code.
Though using the HTML entities is quite valid in an XHTML document, it means the parser has to fetch external entities such as the DTD to work out what the entities are. If you stick to the predefined entities and character references you can use a lightweight non-external-entity-including XML parser without losing your ability to find text-including-entity-references in the document.
The situation with RSS is murky, as usual with all the different RSS versions lurking about. RSS 0.91 had a DTD that included the older HTML 3.2 standard's entities, but the previous official SYSTEM URL for the DTD has gone walkies. (In an annoying and needless piece of internet vandalism, Netscape's owners, AOL, broke the link in a reorg a few years ago. Not only that but they also 302 you to their home page if you try to access it or any other address on the old site, thus serving a badly-written HTML page to clients expecting a DTD. Bad AOL, 302-404s are so bogus.)
RSS 2.0 doesn't have an official DTD at all. So either way, avoid the HTML entities, using the predefined entities and the numeric character references in preference.
onmouseover="tooltip_on( '<strong>Tool...
Not allowable in any document type. <
is invalid in an attribute value.
onmouseover="tooltip_on( '<strong>Tooltip...
Valid but unreadable. I second David's Unobtrusive JavaScript suggestion.