views:

325

answers:

4

What would be a regular expression which I can use to match a valid JavaScript function name...

E.g. myfunction would be valid but my<\fun\>ction would be invalid.

[a-zA-Z0-9_])?
A: 
maerics
According to http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf a name can also start with a '$', a '_' and also a unicode escape sequence, though I'm not sure why one would want that in a name.
eulerfx
wow... I did not think about reserved words.. I wonder how my app would behave if I use reserved words in callback function names..but thanks.. ./^[A-Za-z][A-Za-z0-9_]*$/ should be good for start
Zoom Pat
Thanks for the link! I didn't realize that "$" and "_" are valid starters, but I should have known from using libraries like Prototype and jQuery...
maerics
A: 

This should be very easy. Vaild function names can only consist of alphnumerics,parenthesis, and possibly paramter values within the parens (i don't know enough javascript to know whether paramters are defined in the function call) and must start with a letter, correct? Therefore to validate that a string is a valid function name. Therefore this should work:

[a-xA-z]+[a-zA-z0-9_]*(\(.*?\))*

ennuikiller
Thanks!! my bad... I was not clear...but I am not validating paranthesis nor arguments... so just the function name... 'myfunction'and not 'myfunction()'
Zoom Pat
+2  A: 

What you wan't is close to, or perhaps, impossible -- I haven't analyzed the grammar to know for sure which.

First, take a look at the ECMAScript grammar for identifiers. You can see one on the ANTLR site. Scroll down to where it defines identifiers:

identifierName:
    // snip full comment
    identifierStart (identifierPart)*
    ;

identifierStart:
    unicodeLetter
    | DOLLAR
    | UNDERSCORE
    | unicodeEscapeSequence
    ;

The grammar uses an EBNF, so you'll need follow those two non-terminals: identifierStart and identifierPart. The main problem you'll run into is that you need to take into account much of unicode, and its escape characters.

For example, with identifierStart, we see that the regular expression will need to allow a letter, a dollar sign, an underscore, or a Unicode escape sequence as the first 'character'.

Thus, you could start your regular expression:

"[$_a-zA-Z]..."

Of course, you'll need to change a-zA-Z to support all of Unicode and then augment the expression to support the Unicode Escape Sequence, but hopefully that gives you a start on the process.

Of course, if you only need a rough approximation, many of the other responses provide a rough regular expression that handles a small subset of what's actually allowed.

Kaleb Pederson
The Unicode Escape Sequence isn't part of the identifier itself, it's just a way to write the identifier in source code.
bobince
Point taken. I assumed, perhaps incorrectly, that he might be looking at source code, or something that might be eval'd.
Kaleb Pederson
Yep, that's a possibility too. Meant only as a minor clarification!
bobince
+4  A: 

This is more complicated than you might think. According to the ECMAScript standard, an identifier is:

an IdentifierName that is not a ReservedWord

so first you would have to check that the identifier is not one of:

instanceof typeof break do new var case else return void catch finally
continue for switch while this with debugger function throw default if
try delete in

and potentially some others in the future.

An IdentifierName starts with:

a letter
the $ sign
the _ underscore

and can further comprise any of those characters plus:

a number
a combining diacritical (accent) character
various joiner punctuation and zero-width spaces

These characters are defined in terms of Unicode character classes, so [A-Z] is incomplete. Ä is a letter; ξ is a letter; is a letter. You can use all of those in identifiers including those used for function names.

Unfortunately, JavaScript RegExp is not Unicode-aware. If you say \w you only get the ASCII alphanumerics. There is no feasible way to check the validity of non-ASCII identifier characters short of carrying around the relevant parts of the Unicode Character Database with your script, which would be very large and clumsy.

You could try simply allowing all non-ASCII characters, for example:

^[_$a-zA-Z\xA0-\uFFFF][_$a-zA-Z0-9\xA0-\uFFFF]*$
bobince
Nice answer bobince!
Kaleb Pederson