views:

53

answers:

2

my regular expression is currently:

includes.push\("([^\"\"]*\.js)"\)

but it matches all of the following lines

/*includes.push("javascriptfile.js")*/
/*
includes.push("javascriptfile.js")
*/
includes.push("javascriptfile.js");
includes.push("javascriptfile.js")

And I don't want it to match the lines within comments.

Any regex experts out there got any ideas?

Thanks :o)

Edit I have tested a regex slightly adapted from madgnome. this picks up multiline ones in my test, can you see any problems with it?

includes\.push("([^\"\"]*\.js)")(?!\n**/)

new test is:

/*includes.push("javascriptfile.js")*/
/*
includes.push("javascriptfile.js")
*/
includes.push("javascriptfile.js");
includes.push("javascriptfile.js");
/*includes.push("javascriptfile.js")*/
/*
includes.push("javascriptfile.js")
*/

This includes comments underneath the initial includes strings.

+2  A: 

Depending on your language, you could use negative lookbehind/lookahead

(?<!/\*)includes\.push\("([^\"\"]*\.js)"\)(?!\*/)
  • (?<!/\*) asserts that it is impossible to match the regex /\* before current position
  • (?!\*/) asserts that it is impossible to match the regex \*/ after current position

This regex won't work for multiline comments like your second example, you should trim before use.

Edit: You are using javascript, and negative lookbehind doesn't work in javascript, you could use only the negative lookahead like that :

includes\.push\("([^\"\"]*\.js)"\)(?![\r\n\s]*\*/)

(This regex works for multiline comments like your second example but won't with malformed comments : */ without /*)

madgnome
Thanks, it's the multiline one which is really got me stumped! unfortunately this is the one which raised the issue in the first place and i can't strip out comments beforehand.
Dave Taylor
The second regex in my answer works for multiline
madgnome
Dave, can you replace comments with a unique token, then do the work, then replace the unique token with the original comment again?
Peter Boughton
Thanks, it didn't seem to pick up the multiline ones in my test but this one seems to, can you see any problems with it? `includes\.push\("([^\"\"]*\.js)"\)(?!\n*\*/)`
Dave Taylor
Try this `includes\.push\("[^"]*\.js"\)(?![\r\n]*\*/)`
madgnome
Cracked it...brilliant, thanks for your help `includes\.push\("([^\"\"]*\.js)"\)(?![\r\n\s]*\*/)` it is
Dave Taylor
A: 

You could just match either comments (multi- or single line), or a string literal and inspect the entire match-array:

var text = 
    "// \"foo\" \n" +
    "var s = \"no /* comment */ in here \"; \n" +
    "/*includes.push(\"javascriptfileA.js\")*/\n" +
    "/*\n" +
    "includes.push(\"javascriptfileB.js\")\n" +
    "*/\n" +
    "includes.push(\"javascriptfileC.js\");\n" +
    "includes.push(\"javascriptfileD.js\")\n";

print("--------------------------------------\ntext:\n");

var hits = text.match(/\/\/[^\r\n]*|\/\*[\s\S]*?\*\/|"(?:\\.|[^\\"])*"/g);

print(text);

print("--------------------------------------\nhits:\n");

for(i in hits) {
  var hit = hits[i]; 
  if(hit.indexOf("\"") == 0) {
    print(hit);
  }
}

produces:

--------------------------------------
text:

// "foo" 
var s = "no /* comment */ in here "; 
/*includes.push("javascriptfileA.js")*/
/*
includes.push("javascriptfileB.js")
*/
includes.push("javascriptfileC.js");
includes.push("javascriptfileD.js")

--------------------------------------
hits:

"no /* comment */ in here "
"javascriptfileC.js"
"javascriptfileD.js"

A short explanation of the regex:

//[^\r\n]*         # match a single line comment
|                  # OR
/\*[\s\S]*?\*/     # match a multi-line comment
|                  # OR
"(?:\\.|[^\\"])*"  # match a string literal

Tested online on IDEone.

Bart Kiers