tags:

views:

815

answers:

5

Underscores seem fine. What about dashes? Other special characters?

+6  A: 

The W3C spec Basic HTML data types says "ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".")."

RichieHindle
in practice, support for ":" is confused because of XML namespacing in XHTML documents; it's wise to avoid using it.
DDaviesBrackett
Note in particular that this means an ID can't contain spaces, which makes sense because otherwise <div id="my thing"> wouldn't match this CSS rule:#my thing { color: red }
aem
thanks. when interacting with CSS or JS will any of these things get confused? I could see CSS getting confused with dots and colons, and I've that JS doesn't handle dashes well.
Josh Gibson
Similarly, the period character is often under-considered when JS and CSS APIs are defined, so it's to be avoided as well.
DDaviesBrackett
A: 

According to the HTML 4.0 specs

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

Philippe Gerber
+10  A: 

Actually there is a difference between HTML and XHTML. As XHTML is XML the rules for XML IDs apply:

Values of type ID MUST match the Name production.

NameStartChar ::=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                          [#xD8-#xF6] | [#xF8-#x2FF] |
                          [#x370-#x37D] | [#x37F-#x1FFF] |
                          [#x200C-#x200D] | [#x2070-#x218F] |
                          [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
                          [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
                          [#x10000-#xEFFFF]

NameChar     ::=    NameStartChar | "-" | "." | [0-9] | #xB7 |
                        [#x0300-#x036F] | [#x203F-#x2040]

Source: Extensible Markup Language (XML) 1.0 (Fifth Edition) 2.3

For HTML the following applies:

id = name [CS]
This attribute assigns a name to an element. This name must be unique in a document.

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

Source: HTML 4 Specification, Chapter 6, ID Token

Ludwig Weinzierl
Significantly, this means XML names are a superset of HTML names. i.e. Any valid HTML name is also a valid XML/XHTML name.
Ben Blank
+2  A: 

If we take the title of your question literally, then neither the HTML nor XHTML rules apply. Instead, the relevant spec is the DOM one.

Taking DOM Level 3 as our source, and assuming that by "DOM ID" you mean an attribute with the "ID" flag set, then the value is a "DOMString", the characters of which can be any UTF-16 encodable character.

16-bit unit

The base unit of a DOMString. This indicates that indexing on a DOMString occurs in units of 16 bits. This must not be misunderstood to mean that a DOMString can store arbitrary 16-bit units. A DOMString is a character string encoded in UTF-16; this means that the restrictions of UTF-16 as well as the other relevant restrictions on character strings must be maintained. A single character, for example in the form of a numeric character reference, may correspond to one or two 16-bit units.

Of course, this is probably not what you want, and that Ludwig Weinzierl's answer is what you were looking for. However it is wise to understand that not all DOMs can be serialized as HTML or XHTML and that the DOM has it's own set of rules.

Alohci
A: 

For purposes of valid html aka xhtml, Philippe is correct. No spaces or special characters (certainly none that require escaping) for id attributes. Just the 52 character alphabet, upper and lower case, numerals 0-9, hyphens ("-"), underscores ("_"), colons (":"), and periods (".")

The Enormous Pianist