ansaurus

Question

Extracting nested function names from a JavaScript function

Answer 1

A:

<pre>
<script type="text/javascript">
function someFn() {
 /**
  * Some comment
  */
  function fn1() {
   alert("/*This is not a comment, it's a string literal*/");
  }

  function // keyword
  fn2 // name
  (x, y) // arguments
  {
   /*
   body
   */
  }

  function fn3() {
  alert("this is the word function in a string literal");
  }

  var f = function () { // anonymous, ignore
  };
}

var s = someFn.toString();
// remove inline comments
s = s.replace(/\/\/.*/g, "");
// compact all whitespace to a single space
s = s.replace(/\s{2,}/g, " ");
// remove all block comments, including those in string literals
s = s.replace(/\/\*.*?\*\//g, "");
document.writeln(s);
// remove string literals to avoid false matches with the keyword 'function'
s = s.replace(/'.*?'/g, "");
s = s.replace(/".*?"/g, "");
document.writeln(s);
// find all the function definitions
var matches = s.match(/function(.*?)\(/g);
for (var ii = 1; ii < matches.length; ++ii) {
 // extract the function name
 var funcName = matches[ii].replace(/function(.+)\(/, "$1");
 // remove any remaining leading or trailing whitespace
 funcName = funcName.replace(/\s+$|^\s+/g, "");
 if (funcName === '') {
  // anonymous function, discard
  continue;
 }
 // output the results
 document.writeln('[' + funcName + ']');
}
</script>
</pre>

I'm sure I missed something, but from your requirements in the original question, I think I've met the goal, including getting rid of the possibility of finding the function keyword in string literals.

One last point, I don't see any problem with mangling the string literals in the function blocks. Your requirement was to find the function names, so I didn't bother trying to preserve the function content.

Grant Wagner 2009-02-05 21:23:44

I think this will break if comments and strings don't nest 'properly' - imo there's no way around manually parsing the source code...

Christoph 2009-02-05 21:38:19

You can assume the nesting is proper because I'm parsing an already compiled (valid) JavaScript function.

Ates Goral 2009-02-05 22:06:07

@Ates: with 'nested improperly' I meant things like ` // " <NEWLINE> " `, ` /* " */ " `,...

Christoph 2009-02-05 22:16:01

Answer 2

+3 A:

Cosmetic changes and bugfix

The regular expression must read \bfunction\b to avoid false positives!

Functions defined in blocks (e.g. in the bodies of loops) will be ignored if nested does not evaluate to true.

function tokenize(code) {
    var code = code.split(/\\./).join(''),
        regex = /\bfunction\b|\(|\)|\{|\}|\/\*|\*\/|\/\/|"|'|\n|\s+/mg,
        tokens = [],
        pos = 0;

    for(var matches; matches = regex.exec(code); pos = regex.lastIndex) {
        var match = matches[0],
            matchStart = regex.lastIndex - match.length;

        if(pos < matchStart)
            tokens.push(code.substring(pos, matchStart));

        tokens.push(match);
    }

    if(pos < code.length)
        tokens.push(code.substring(pos));

    return tokens;
}

var separators = {
    '/*' : '*/',
    '//' : '\n',
    '"' : '"',
    '\'' : '\''
};

function extractInnerFunctionNames(func, nested) {
    var names = [],
        tokens = tokenize(func.toString()),
        level = 0;

    for(var i = 0; i < tokens.length; ++i) {
        var token = tokens[i];

        switch(token) {
            case '{':
            ++level;
            break;

            case '}':
            --level;
            break;

            case '/*':
            case '//':
            case '"':
            case '\'':
            var sep = separators[token];
            while(++i < tokens.length && tokens[i] !== sep);
            break;

            case 'function':
            if(level === 1 || (nested && level)) {
                while(++i < tokens.length) {
                    token = tokens[i];

                    if(token === '(')
                        break;

                    if(/^\s+$/.test(token))
                        continue;

                    if(token === '/*' || token === '//') {
                        var sep = separators[token];
                        while(++i < tokens.length && tokens[i] !== sep);
                        continue;
                    }

                    names.push(token);
                    break;
                }
            }
            break;
        }
    }

    return names;
}

Christoph 2009-02-05 22:14:23

@Peter: should work now

Christoph 2009-02-06 00:25:53

Yep, that appears to work here now.

Peter Boughton 2009-02-07 15:55:08

Thanks for this answer Christoph. I'll write some unit tests to see if it meets all scenarios. I'm also initiating a bounty to see if anyone can come up with a shorter solution.

Ates Goral 2009-02-09 18:01:46

Just came up with this alternative: http://stackoverflow.com/questions/517411/extracting-nested-function-names-from-a-javascript-function/546984#546984

Ates Goral 2009-02-13 20:54:03

Functions declared inside loops aren't really "nested", just like "var" declarations inside loops aren't really nested. The functions are visible outside the loop too.

Pointy 2010-02-14 03:20:53

Answer 3

+3 A:

The academically correct way to handle this would be creating a lexer and parser for a subset of Javascript (the function definition), generated by a formal grammar (see this link on the subject, for example).

Take a look at JS/CC, for a Javascript parser generator.

Other solutions are just regex hacks, that lead to unmaintainable/unreadable code and probably to hidden parsing errors in particular cases.

As a side note, I'm not sure to understand why you aren't specifying the list of unit test functions in your product in a different way (an array of functions?).

friol 2009-02-09 19:07:08

jsUnity supports a variety of formats, including and array of functions. The thing I like about the closure syntax is its compactness and resemblance to jUnit tests.

Ates Goral 2009-02-09 22:01:29

JS/CC looks very interesting and seems to be the right path in achieving what I want.

Ates Goral 2009-02-12 05:51:09

Answer 4

A:

Would it matter if you defined your tests like:

var tests = {
 test1: function (){
  console.log( "test 1 ran" );
 },

 test2: function (){
  console.log( "test 2 ran" );
 },

 test3: function (){
  console.log( "test 3 ran" );
 }
};

Then you could run them as easily as this:

for( var test in tests ){ 
 tests[test]();
}

Which looks much more easier. You can even carry the tests around in JSON that way.

Mehmet Duran 2009-02-10 12:30:35

@Mehmet: This is in fact a syntax already supported by jsUnity: http://code.google.com/p/jsunity/wiki/ObjectTestSuite

Ates Goral 2009-02-12 05:28:08

Answer 5

+1 A:

I like what you're doing with jsUnity. And when I see something I like (and have enough free time ;)), I try to reimplement it in a way which better suits my needs (also known as 'not-invented-here' syndrome).

The result of my efforts is described in this article, the code can be found here.

Feel free to rip-out any parts you like - you can assume the code to be in the public domain.

Christoph 2009-02-10 14:25:53

This looks very interesting! I guess it's legal in JS to repeat the same label? I'll hopefully apply your answer to jsUnity some time soon. And thanks for the nod ;)

Ates Goral 2009-02-12 05:23:41

@Ates: ECMA-262, 3rd edition, 12.12: labels are added to the label set of the statement they prefix (ie the strings in this case); it's only illegal to nest statements with the same label, eg `foo: while(true) { foo: "bar"; }`

Christoph 2009-02-12 11:23:48

Answer 6

+1 A:

The trick is to basically generate a probe function that will check if a given name is the name of a nested (first-level) function. The probe function uses the function body of the original function, prefixed with code to check the given name within the scope of the probe function. OK, this can be better explained with the actual code:

function splitFunction(fn) {
    var tokens =
        /^[\s\r\n]*function[\s\r\n]*([^\(\s\r\n]*?)[\s\r\n]*\([^\)\s\r\n]*\)[\s\r\n]*\{((?:[^}]*\}?)+)\}\s*$/
        .exec(fn);

    if (!tokens) {
        throw "Invalid function.";
    }

    return {
        name: tokens[1],
        body: tokens[2]
    };
}

var probeOutside = function () {
    return eval(
        "typeof $fn$ === \"function\""
        .split("$fn$")
        .join(arguments[0]));
};

function extractFunctions(fn) {
    var fnParts = splitFunction(fn);

    var probeInside = new Function(
        splitFunction(probeOutside).body + fnParts.body);

    var tokens;
    var fns = [];
    var tokenRe = /(\w+)/g;

    while ((tokens = tokenRe.exec(fnParts.body))) {
        var token = tokens[1];

        try {
            if (probeInside(token) && !probeOutside(token)) {
                fns.push(token);
            }
        } catch (e) {
            // ignore token
        }
    }

    return fns;
}

Runs fine against the following on Firefox, IE, Safari, Opera and Chrome:

function testGlobalFn() {}

function testSuite() {
    function testA() {
        function testNested() {
        }
    }

    // function testComment() {}
    // function testGlobalFn() {}

    function // comments
    testB /* don't matter */
    () // neither does whitespace
    {
        var s = "function testString() {}";
    }
}

document.write(extractFunctions(testSuite));
// writes "testA,testB"

Edit by Christoph, with inline answers by Ates:

Some comments, questions and suggestions:

Is there a reason for checking
```
typeof $fn$ !== "undefined" && $fn$ instanceof Function
```
instead of using
```
typeof $fn$ === "function"
```
instanceof is less safe than using typeof because it will fail when passing objects between frame boundaries. I know that IE returns wrong typeof information for some built-in functions, but afaik instanceof will fail in these cases as well, so why the more complicated but less safe test?

[AG] There was absolutely no legitimate reason for it. I've changed it to the simpler "typeof === function" as you suggested.

How are you going to prevent the wrongful exclusion of functions for which a function with the same name exists in the outer scope, e.g.
```
function foo() {}


function TestSuite() {
    function foo() {}
}
```

[AG] I have no idea. Can you think of anything. Which one is better do you think? (a) Wrongful exclusion of a function inside. (b) Wronfgul inclusion of a function outside.

I started to think that the ideal solution will be a combination of your solution and this probing approach; figure out the real function names that are inside the closure and then use probing to collect references to the actual functions (so that they can be directly called from outside).

It might be possible to modify your implementation so that the function's body only has to be eval()'ed once and not once per token, which is rather inefficient. I might try to see what I can come up with when I have some more free time today...

[AG] Note that the entire function body is not eval'd. It's only the bit that's inserted to the top of the body.

[CG] Your right - the function's body only gets parsed once during the creation of probeInside - you did some nice hacking, there ;). I have some free time today, so let's see what I can come up with...

A solution that uses your parsing method to extract the real function names could just use one eval to return an array of references to the actual functions:

return eval("[" + fnList + "]");

[CG] Here is with what I came up. An added bonus is that the outer function stays intact and thus may still act as closure around the inner functions. Just copy the code into a blank page and see if it works - no guarantees on bug-freelessness ;)

<pre><script>
var extractFunctions = (function() {
    var level, names;

    function tokenize(code) {
        var code = code.split(/\\./).join(''),
            regex = /\bfunction\b|\(|\)|\{|\}|\/\*|\*\/|\/\/|"|'|\n|\s+|\\/mg,
            tokens = [],
            pos = 0;

        for(var matches; matches = regex.exec(code); pos = regex.lastIndex) {
            var match = matches[0],
                matchStart = regex.lastIndex - match.length;

            if(pos < matchStart)
                tokens.push(code.substring(pos, matchStart));

            tokens.push(match);
        }

        if(pos < code.length)
            tokens.push(code.substring(pos));

        return tokens;
    }

    function parse(tokens, callback) {
        for(var i = 0; i < tokens.length; ++i) {
            var j = callback(tokens[i], tokens, i);
            if(j === false) break;
            else if(typeof j === 'number') i = j;
        }
    }

    function skip(tokens, idx, limiter, escapes) {
        while(++idx < tokens.length && tokens[idx] !== limiter)
            if(escapes && tokens[idx] === '\\') ++idx;

        return idx;
    }

    function removeDeclaration(token, tokens, idx) {
        switch(token) {
            case '/*':
            return skip(tokens, idx, '*/');

            case '//':
            return skip(tokens, idx, '\n');

            case ')':
            tokens.splice(0, idx + 1);
            return false;
        }
    }

    function extractTopLevelFunctionNames(token, tokens, idx) {
        switch(token) {
            case '{':
            ++level;
            return;

            case '}':
            --level;
            return;

            case '/*':
            return skip(tokens, idx, '*/');

            case '//':
            return skip(tokens, idx, '\n');

            case '"':
            case '\'':
            return skip(tokens, idx, token, true);

            case 'function':
            if(level === 1) {
                while(++idx < tokens.length) {
                    token = tokens[idx];

                    if(token === '(')
                        return idx;

                    if(/^\s+$/.test(token))
                        continue;

                    if(token === '/*') {
                        idx = skip(tokens, idx, '*/');
                        continue;
                    }

                    if(token === '//') {
                        idx = skip(tokens, idx, '\n');
                        continue;
                    }

                    names.push(token);
                    return idx;
                }
            }
            return;
        }
    }

    function getTopLevelFunctionRefs(func) {
        var tokens = tokenize(func.toString());
        parse(tokens, removeDeclaration);

        names = [], level = 0;
        parse(tokens, extractTopLevelFunctionNames);

        var code = tokens.join('') + '\nthis._refs = [' +
            names.join(',') + '];';

        return (new (new Function(code)))._refs;
    }

    return getTopLevelFunctionRefs;
})();

function testSuite() {
    function testA() {
        function testNested() {
        }
    }

    // function testComment() {}
    // function testGlobalFn() {}

    function // comments
    testB /* don't matter */
    () // neither does whitespace
    {
        var s = "function testString() {}";
    }
}

document.writeln(extractFunctions(testSuite).join('\n---\n'));
</script></pre>

Not as elegant as LISP-macros, but still nice what JAvaScript is capable of ;)

Ates Goral 2009-02-13 17:58:48

1. why not `isFnTmp = "typeof $fn$ === \"function\"` - `instanceof` breaks across frame boundaries! - 2. how do you plan on handling `window.func = function func() {}`?

Christoph 2009-02-13 21:29:40

3. I don't think this will performe well (didn't benchmark, shame on me :( ) - you'll have to `eval()` the whole function body for each token!

Christoph 2009-02-13 21:35:04

@Christoph: Your concern #1 should be (at least to a practical extent) handled with the probeOutside addition.

Ates Goral 2009-02-13 21:53:48

@Christoph: #3: You may have misread the code; the typeof check is eval'd once per each token. Of course, depending on the # of tokens in a given code block, this may be cumbersome. However, performance is not a concern since this only done at test suite compilation.

Ates Goral 2009-02-13 21:56:12

@Ates: Do you have a problem with me editing your answer to add my questions there? The comments are a bit limiting...

Christoph 2009-02-13 22:00:57

@Christoph: I've wiki-ized the answer to help with the collaborative effort :)

Ates Goral 2009-02-14 01:05:30

@Ates: added my version

Christoph 2009-02-16 14:18:01

ansaurus

tags:

views:

answers:

Extracting nested function names from a JavaScript function

Test subject

Results

related questions