views:

39

answers:

1

Basically I attempting to extract the last tag name of a handful of different css selectors.

I have successfully implemented what I am talking about in javascript, I'm looking for a more compact way using only 1 regex expression preferably.

Here is what I successfully have working.

//Trim selector, remove [attr] selectors due to conflicts, and standardize >,~,
selector=selector.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g,'')
selector=selector.replace(/\[[^\]]*\]/g,'[]').replace(/\s*([>~\s])\s*/g,'$1');
var theSplit = selector.split(/[>~\s]/);
selector  = /^[^.\[:]*/.exec(theSplit[theSplit.length-1]) || "*";

I am only looking to support css 2.0 selectors that are 100% supported by internet explorer 7.

For example + selector and :first-child selectors are static in ie7 and therefore I have no need to support them. Here is a list of css selectors that must work.

#test span ul a
#test >span[style="background:green"]
#id + span ~ article.class
section header.class
body
div
body div
div p
div > p
div ~ p
div[class^=exa][class$=mple]
div p a
.note
div.example
ul .tocline2
#title                                  
h1#title
div #title                                      
ul.toc li.tocline2
ul.toc > li.tocline2
a[href][lang][class]
div[class]
div[class=example]
div[class^=exa]
div[class$=mple]
div[class*=e]
div[class|=dialog]
div[class!=made_up]
div[class~=example]

Edit : I ended up using this script. It even takes into consideration the universal selector

var lastTagName = selector.replace(/\[[^\]]+\]|[\.#][\w-]+|\s*$/g,'').replace(/^.*[^\w]/, '')||'*'
A: 

The stuff at the beginning of the selector should be easy enough to remove as a last step, so let's focus on removing the two types of junk from the end first. That's stuff in square brackets and stuff following a . or #.

str = str.replace(/\[[^\]]+\]/g, '').replace(/[\.#][a-z0-9]+/gi, '');

That is, remove everything between square brackets (in the entire expression; it's okay if it catches stuff earlier), and then likewise for . and # selectors.

Then, finally, take out everything leading up to a non-alphanumeric character, which covers spaces, >, et cetera. This will leave only the last "word" (in this case a tag name) in the string:

str = str.replace(/^.*[^a-z0-9]/i, '');

A jsFiddle to try it out: http://jsfiddle.net/JAA98/

VoteyDisciple
Ultimately I ended up using your script(with a couple modifications). Do you know of any way to simplify it to one expression?
Lime
The power of this approach is, in fact, that it takes two steps. The first two (looking for square brackets and for `.`/`#` modifiers can be combined into one expression with an `|`, but combining the two steps together would require us to describe exactly how different bits of the selectors repeat. Added bonus: for the most part, it's actually readable.
VoteyDisciple