Hi everyone, Can any one help me in finding the method names using regular expressions in javascript files.
(?!function\s+)([_$a-zA-Z][_$a-zA-Z0-9]*)(?=\s*\()
There are many issues you can run into when trying to parse JavaScript with regexp. First we have a couple things that under normal circumstances would be ignored by a lexer.
WhiteSpace LineTerminator Comment
Now the concept of white space is not as simple as a space character. Here is a full list of characters that must be covered in your regexp.
WhiteSpace: '\u0009' '\u000c' '\u00a0' '\u180e' '\u2001' '\u2003' '\u2005' '\u2007' '\u2009' '\u202f' '\u3000' '\u000b' '\u0020' '\u1680' '\u2000' '\u2002' '\u2004' '\u2006' '\u2008' '\u200a' '\u205f' '\ufeff'
Right off the bat our regexp has ballooned in complexity. Now we have the LineTerminator production which once again is not as simple as you would think.
LineTerminator: '\u000a' '\u000d' '\u2028' '\u2029'
I won't go into more detail but here are a few examples of perfectly valid function definitions.
function
a() {
}
function /*Why is this comment here!!!*/ a() {
}
So we are left with some good news and some bad news. The good news is that my simple regexp will cover most of the common cases. As long as the file is written in a sane matter it should work just fine. The bad news is if you wanted to cover all corner cases you will be left with a monstrosity of a regexp.
Note
I just wanted to say that the regexp to match a valid function identifier would be particularly horrendous.