views:

53

answers:

2

What is a regular expression that can be used to validate a CSS selector, and can do so in a way that a invalid selector halts quickly.

Valid selectors:

EE
#myid
.class
.class.anotherclass
EE .class
EE .class EEE.anotherclass
EE[class="test"]
.class[alt~="test"]
#myid[alt="test"]
EE:hover
EE:first-child
E[lang|="en"]:first-child
EE#test .class>.anotherclass
EE#myid.classshit.anotherclass[class~="test"]:hover
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hover

Invalid selectors, e.g. contain extra whitespace at the end of the line:

EE:hover   EE
EE .class EEE.anotherclass 
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hov     9
EE#myid.classshit.anotherclass[class="test"]:first-child EE.Xx:hov  -daf
+2  A: 

Regular expressions are the wrong tool. CSS selectors are way to complex. Example:

bo\
dy:not(.\}) {}

Use a parser with a real tokenizer like this one: PHP-CSS-Parser. It is easier to rewrite it to Java than getting regex right.

toscho
A: 

The problem with yer typical regular expression is that they are unable to handle arbitrary levels of nesting. They have no memory. Consider a string of some number of a's followed by the same number of b's: aaabbb and a reasonable regexp a*b*. When the regexp gets to the first 'b' it has no memory how many a's it recognized and therefore it can't recognize the same number of b's.

Now replace a and b with ( and ), IF and END, <x> and </x> etc... and you can see the problem.

Tony Ennis