views:

64

answers:

1

Hi everyone, Can any one help me in finding the method names using regular expressions in javascript files.

+2  A: 
(?!function\s+)([_$a-zA-Z][_$a-zA-Z0-9]*)(?=\s*\()

There are many issues you can run into when trying to parse JavaScript with regexp. First we have a couple things that under normal circumstances would be ignored by a lexer.

WhiteSpace
LineTerminator
Comment

Now the concept of white space is not as simple as a space character. Here is a full list of characters that must be covered in your regexp.

WhiteSpace:
    '\u0009'
    '\u000c'
    '\u00a0'
    '\u180e'
    '\u2001'
    '\u2003'
    '\u2005'
    '\u2007'
    '\u2009'
    '\u202f'
    '\u3000'
    '\u000b'
    '\u0020'
    '\u1680'
    '\u2000'
    '\u2002'
    '\u2004'
    '\u2006'
    '\u2008'
    '\u200a'
    '\u205f'
    '\ufeff'

Right off the bat our regexp has ballooned in complexity. Now we have the LineTerminator production which once again is not as simple as you would think.

LineTerminator:
    '\u000a'
    '\u000d'
    '\u2028'
    '\u2029'

I won't go into more detail but here are a few examples of perfectly valid function definitions.

function
a() {

}

function /*Why is this comment here!!!*/ a() {

}

So we are left with some good news and some bad news. The good news is that my simple regexp will cover most of the common cases. As long as the file is written in a sane matter it should work just fine. The bad news is if you wanted to cover all corner cases you will be left with a monstrosity of a regexp.

Note

I just wanted to say that the regexp to match a valid function identifier would be particularly horrendous.

ChaosPandion
Thanks .will it work for all possible cases?
programmer4programming
No. JavaScript syntax is far too complex for regex to parse it reliably.
bobince
And not only are regular expressions inadequate for the task of parsing Javascript, in the general case locating the "names" of all the functions in a source file is not always even a meaningful concept. Because Javascript functions are *values*, looking for the names of all the functions in a module is like looking for the names of all the numbers in a module.
Pointy
@bobince - You would be surprised at how close you can get. We would never come close to 100% though. Depending on the project requirements writing a proper tokenizer may not be worth the effort. From my experience it would take a minimum of 350 lines *(F# using parser combinators.)* and anywhere between 1000-5000 lines of C# *(The parser generators tend to generate huge code files. If you write it manually you can get away with 1000.)*
ChaosPandion