views:

275

answers:

3

Is there any way to match a function block in javascript source code using regular expressions?

(Really I'm trying to find the opposite of that, but I figured this would be a good place to start.)

+3  A: 

No, it is not possible. Regexes can't match nested pairs of characters. So something like this would fool it:

function foo() {
    if(bar) {
        baz();
    } // oops, regex would think this was end of function
}

However, you could create a fairly simple grammar to do it (in EBNF-ish form):

javascript_func
: "function" ID "(" ")" "{" body* "}"
| "function" ID "(" params ")" "{" body* "}"
;

params
: ID
| params "," ID

body
: [^{}]* // assume this is like a regex
| "{" body* "}"
;

Oh, this is also assuming you have some kind of lexer to strip out whitespace and comments.

Zifre
@Zifre I'm not giving up, but I hope this is in correct!
leeand00
Actually, a greedy regex would match the whole function. However, if another function followed it, it would be grabbed too.
GalacticCowboy
Oh, and you can have nested functions too... (function definition within another)
GalacticCowboy
@GalacticCowboy: I assumed the greediness problem would be fairly obvious, but you are correct.
Zifre
+5  A: 

There are a certain things that regular expressions just aren't very good at. That doesn't mean it's impossible to build an expression that will work, just that it's probably not a good fit. Among those things:

  • multi-line input
  • nesting

Javascript function blocks tend to cover multiple lines, and you are going to want to find the matching "{" and "}" braces that signify the start and end of the block, which could be nested to an unknown depth. You also need to account for potential braces used inside comments. RegEx will be painful for this.

That doesn't mean it's impossible, though. You might have additional information about the nature of the functions you're looking for. If you can do things like guarantee no braces in comments and limit nesting to a specific depth, you could still build an expression to do it. It'll be somewhat messy and hard to maintain, but at least within the realm of the possible.

Joel Coehoorn
How is multi-line input a problem?
Zifre
It depends on the engine- some just don't support it. others have bugs.
Joel Coehoorn
Isn't that more of a problem of the engines than regexes?
Zifre
You could say that. I see it as more of an issue with the technology- multi-line is just one more thing you have to watch for.
Joel Coehoorn
+3  A: 

Not really, no.

Function blocks aren't regular and so regular expressions aren't the right tool for the job. See, in order to capture a function block in JS, you need to count instances of { and balance them against instances of }, otherwise you're going to match too much or too little. Regular expressions can't do this kind of counting.

Just read in the file you're trying to look at and manage the nesting recursively. It's conceptually very easy to manage this way.

Welbog