views:

84

answers:

6

I'm working on a large and extremely messy javascript file, and I would like to remove all functions from the file, ultimately creating a version which contains only data.

the code looks something like this:

var foo : bar = "hi";
function foobar (){
  //blah blah
}
var fobar:bar;
var barfo:bar;
function imSoUgly(){
  //Blah blah blah blah mr freeman
}

The regex I would like to build would find all function.{.} and delete them, producing this:

var foo : bar = "hi";
var fobar:bar;
var barfo:bar;

I'm not quite sure where to start with this. Ideally I would like to do it with Textmate's RegEx, but I'm easy.

+3  A: 

I don't think it is possible to do this with only with regular expressions, as it is not possible to match starting and ending braces (code blocks) which can be arbitrary deeply nested.

To do this reliably, you would need to recursively look through all the inner code code blocks to locate the end of the function. Or something like that (count the number of braces, ...).

Carko
A: 
Pessimist
+1  A: 

You can't. That being said you could use something like this

function\s+\w+\s*\([^)]*\)\s*{[^}]*}

but it will fail if there are any { or } inside the function and you can't do anything about this

Diadistis
It will also fail in something like: `function XYZ (parm1, (parm2 * 10)) {...}`
Brock Adams
A: 

In my opinion, Regex is not sufficient to do something as complex as this is. The best I could do with regex is this:

[\r\n]function [\w ]*\(\)\{[\w\W]*?}

That will remove all the functions in your example, but if you had something like this, it wouldn't work:

function foobar (){
   if(condition){
      // do something
   } // this end brace would be mis-interpreted as the end of the function
   // bla, bla, bla
}

You would still have:

   // bla, bla, bla
}

Pessimist's answer would work, but ONLY if all of the functions have no spaces before the closing line, which is unlikely to be true.

The bottom line is that you really need a real JavaScript parser. A quick google search found this:

http://www.antlr.org/

Computerish
A: 

You can't do this with a "regular" expression, but some languages provide pattern-matching constructs which allow you to match (among other things) balanced text.

For example, Perl:

/function\s*\(\)\s*(\{([^{}]++|(?1))*\}/

Whether it's the correct tool for the job (HINT: It probably isn't) is another question entirely.

Anon.
You still need to worry about braces inside strings.
Matthew Crumley
A: 

As everyone is saying, you cant do this with regex. And since Textmate has limited macro capability, you can't do it in Textmate.

What you need is a filter. And you can write one in any convenient language, like Python.

Here's one written in JavaScript; as you can see, they can be quite involved. (Note that this has been tested, but not exhaustively.)

var BadScript       =  'var foo : bar = "hi";                   \n   \
                        function foobar (){                     \n   \
                            //blah blah                         \n   \
                        }                                       \n   \
                        var fubar:bar;                          \n   \
                        var barfo:bar;                          \n   \
                        var SomeObj = {X:3, Z:(9+17)};          \n   \
                        Obj2 = {myfunc:function() {return 13;}};\n   \
                                                                \n   \
                        function imSoUgly(bReally, (1+4)){      \n   \
                            if (bReally) {                      \n   \
                                // Are too!                     \n   \
                            }                                   \n   \
                            else {                              \n   \
                                // Am not!                      \n   \
                            }                                   \n   \
                            //Blah blah blah blah mr freeman    \n   \
                        }                                       \n   \
                       ';

var GoodText        = '';
var iRawLen         = BadScript.length;
var bInFunction     = false;
var iNumBraces      = 0;
var iNumParentheses = 0;
var sFunctionState  = 'clear';  //-- We need a state machine.  More, below.
var sSuspectText    = '';
var oKeyWord        = { init:function (s) {this.name=s; this.len=s.length; this.J=0;},
                        sGet:function ()  {return this.name[this.J];},
                        reset:function () {this.J = 0;},
                        bIncr:function () {this.J += 1; if(this.J >= this.len) {this.J=0; return true;} else return false;}
                      };
oKeyWord.init ('function');


for (var K=0;  K < iRawLen;  K++)
{
    var sChar       = BadScript[K];

    if (bInFunction)
    {
        if (sChar == '{')
        {
            iNumBraces++;
        }
        else if (sChar == '}')
        {
            iNumBraces--;

            if (iNumBraces == 0)
            {
                sFunctionState  = 'clear';
                bInFunction     = false;
            }
        }
        continue;
    }

    var bInvalidFuncDeclaration = false;
    sSuspectText               += sChar;


    switch (sFunctionState)
    {
        /*--- Our actions vary depending on one of 7 main states.
            They are (in sequence):
                    'clear'
                    'in function tag'
                    'in whitespace, post tag'
                    'in function name'
                    'in whitespace, post name'
                    'in parentheses'
                    'in whitespace, post parentheses'
        */
        case 'clear':
            if (sChar == oKeyWord.sGet() )
            {
                sFunctionState              = 'in function tag';
                sSuspectText                = sChar;
                if (oKeyWord.bIncr() )
                {
                    //--- Keyword was only 1 char long.
                    sFunctionState          = 'in whitespace, post tag';
                }
            }
            else
                GoodText                   += sChar;
        break;

        case 'in function tag':
            if (sChar == oKeyWord.sGet() )
            {
                if (oKeyWord.bIncr() )
                {
                    //--- Reached the end of the keyword.
                    sFunctionState          = 'in whitespace, post tag';
                }
            }
            else
            {
                //--- We found a non-matching character before the keyword was completed.
                oKeyWord.reset();
                bInvalidFuncDeclaration     = true;
            }
        break;

        case 'in whitespace, post tag':
            if (!/\s/.test (sChar) )            //-- Is not whitespace?
            {
                if (/\w/.test (sChar) )         //-- Legal name-char.
                {
                    sFunctionState          = 'in function name';
                }
                else if (sChar == '(')
                {
                    //--- This is the case of an anonymous function.
                    sFunctionState          = 'in parentheses';
                    iNumParentheses++;
                }
                else
                {
                    bInvalidFuncDeclaration = true;
                }
            }
        break;

        case 'in function name':
            if (!/\w/.test (sChar) )            //-- Not legal name-char?
            {
                if (/\s/.test (sChar) )         //-- Is whitespace?
                {
                    sFunctionState          = 'in whitespace, post name';
                }
                else if (sChar == '(')
                {
                    sFunctionState          = 'in parentheses';
                    iNumParentheses++;
                }
                else
                {
                    bInvalidFuncDeclaration = true;
                }
            }
        break;

        case 'in whitespace, post name':
            if (!/\s/.test (sChar) )            //-- Is not whitespace?
            {
                if (sChar == '(')
                {
                    sFunctionState          = 'in parentheses';
                    iNumParentheses++;
                }
                else
                {
                    bInvalidFuncDeclaration = true;
                }
            }
        break;

        case 'in parentheses':
            if (sChar == '(')
            {
                iNumParentheses++;
            }
            else if (sChar == ')')
            {
                iNumParentheses--;

                if (iNumParentheses == 0)
                {
                    sFunctionState  = 'in whitespace, post parentheses';
                    bInFunction     = false;
                }
            }
        break;

        case 'in whitespace, post parentheses':
            if (!/\s/.test (sChar) )            //-- Is not whitespace?
            {
                if (sChar == '{')
                {
                    sFunctionState              = 'clear';
                    sSuspectText                = '';
                    bInFunction                 = true;
                    iNumBraces++;
                }
                else
                {
                    bInvalidFuncDeclaration = true;
                }
            }
        break;

        default:
            throw new  Error ('Undefined sFunctionState: "' + sFunctionState + '"');
        break;
    }

    if (bInvalidFuncDeclaration)
    {
        GoodText                   += sSuspectText;
        sFunctionState              = 'clear';
        sSuspectText                = '';
        bInFunction                 = false;
    }
}

console.log (GoodText);     //-- Use alert() if not Firebug.

if (iNumBraces)         throw new  Error ('Mismatched Braces. ' + iNumBraces + ' left over.');
if (iNumParentheses)    throw new  Error ('Mismatched Parentheses. ' + iNumParentheses + ' left over.');
Brock Adams