views:

146

answers:

2

What are the (full) valid / allowed charset characters for CSS identifiers id and class?

Is there a regular expression that I can use to validate against? Is it browser agnostic?

+8  A: 

The charset doesn't matter. The allowed characters matters more. Check the CSS specification. Here's a cite of relevance:

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A1 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F".

Update: As to the regex question, you can find the grammar here:

ident      -?{nmstart}{nmchar}*

Which contains of the parts:

nmstart    [_a-z]|{nonascii}|{escape}
nmchar     [_a-z0-9-]|{nonascii}|{escape}
nonascii   [\200-\377]
escape     {unicode}|\\[^\r\n\f0-9a-f]
unicode    \\{h}{1,6}(\r\n|[ \t\r\n\f])?
h          [0-9a-f]

This can be translated to a Java regex as follows (I only added parentheses to parts containing the OR and escaped the backslashes):

String h = "[0-9a-f]";
String unicode = "\\\\{h}{1,6}(\\r\\n|[ \\t\\r\\n\\f])?".replace("{h}", h);
String escape = "({unicode}|\\\\[^\\r\\n\\f0-9a-f])".replace("{unicode}", unicode);
String nonascii = "[\\x200-\\x377]";
String nmchar = "([_a-z0-9-]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String nmstart = "([_a-z]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String ident = "-?{nmstart}{nmchar}*".replace("{nmstart}", nmstart).replace("{nmchar}", nmchar);

System.out.println(ident); // The full regex.

Update 2: oh, you're more a PHP'er, well I think you can figure how/where to do str_replace?

BalusC
amphetamachine
THANK YOU! That's just awesome! :D I though it was very limited but didn't knew I could use `\` as an escape character. Has anyone ever built a regex to validate the allowed chars?
Alix Axel
That's perfect, and yes I can figure it out. =) Thanks again!
Alix Axel
You're welcome. Don't forget to make it case insensitive or to lowercase the identifier beforehand.
BalusC
+2  A: 

This question appears to be a duplicate of s.o. Q448981: What characters are valid in CSS class names?

pyrony
Maybe it's your low repuration, but this should really have been posted as a comment rather than as an answer, since it adds nothing new to the link.
BalusC
@BalusC: Actually the accepted answer on that question has an almost complete working regex. I say almost because it doesn't include the escaped chars.
Alix Axel
sorry about that, thanks for the heads up.
pyrony