views:

230

answers:

4

I'm trying to parse some AS3 with the use of Regular Expressions. I cannot, for the life of me, figure out how to omit matches that are inside of string quotations. I need to match test in the variable name testString, but not the test thats between the quotations. I don't want to match anything that's part of any string's content.

var testString:String = "This is a test String";
+1  A: 
^[^"]*test

will work for the example abolve, matching test which has no quotes in front of it, do you need to match a test that comes after quotes, but on the same line such as..

method("string",test);

if so you will need something more complex like

^[^"]*(?:"[^"]*")*[^"]*test

which will (hopefully) match any number (0 or more) pairs of quotation marks, then test.

Paul Creasey
+1  A: 

You can have a code like that but remember that in as3 a string can be "rfwerfwer" or 'sfsrfwervwer' or "fvsfv\"sdfvsdfv" or 'sfvsdfv\'fvsfvsdfv'. Using only regexp will be difficult for you parsing.

the regexp says take in the first group all chars that is not " then optionaly you can have in the second group a " and if it the case take in the third group every char except a "

so you will have in the first group var s:String= and in the third group if it exists the string without the quote This is a test String.

in as3

var s:String='var testString:String = "This is a test String";';
var re:RegExp=/([^"]+)("([^"]+)")?/;
var o:Array=re.exec(s);
if (o){
 trace(o[1])
}
Patrick
+1  A: 

a regular expression that matches strings in c-alike languages (with backslash escaping) is like this

"(\\\\.|[^"])*"

basically, "match a quote, then any number of (escape sequence or not a quote), then a quote".

matching outside strings is much more tricky, the simplest approach is to parse in two passes: first, replace the above with nothing (i.e. eliminate all strings) and then find the subject in the rest (i.e. everything that is not a string).

that said, regular expressions is not a proper tool for parsing programming languages. Consider a parser: yacc, lemon or similar.

stereofrog
+1 true on both accounts
just somebody
+1  A: 

Patrick brought up some good points about escaped quotes and single-quoted strings, but it's even worse than that: what about comments? Comments can contain quotes (double or single), and string literals can contain things that look like comment delimiters. And don't forget regexes themselves: regex literals can contain any of those things, and regexes can also be written in the form of string literals for use with the RegExp constructor.

If you know in advance that such syntactic overlaps won't happen (or will be very limited), you might be able to do what you want, but it will probably be very ugly. But what you really need is a full-blown parser, or a completely different approach to the underlying problem. I know it sounds like a very simple thing to do, but it's just a really bad fit for the way regexes work.

Alan Moore
+1 for the additional infos
Patrick
I haven't found anything that works... Thanks Alan. I think Regex is not the right tool.I have come to a work around that works by using ".*" to push all strings into an array, then later when I'm done manipulating everything else, I just put the string back untouched.
Luke