tags:

views:

314

answers:

9

I thought I understood C# regular expressions, but clearly it's not the case. I need some help devising an expression that would find everything from START|BEGIN until )). Expression can be multi line.

Ex.

START( FTP_STATE, XXX(
   VAL( FTP_INITIAL_STATE, 0 )
   VAL( FTP_INBOUND,       1 )
   AL( FTP_OUTBOUND,      2 )
))

/**************************************************************/

BEGIN( FTP_TIMER_MODE, YYY(
   VAL( FTP_REMOVE_TIMER,     0 )
   VAL( FTP_NOT_REMOVE_TIMER, 1 )
))

/**************************************************************/

Any help greatly appreciated

+10  A: 

It is straight forward; START or BEGIN, then any number of any symbol but non greedy and finaly the two closing parenthesis. .*? takes any number of any symbol, but as few as possible. To match over more then one line the single line option (?s) must be enabled (Thanks Alan M. for pointing that out.)

(?s)(START|BEGIN).*?\)\)
Daniel Brückner
Nice explanation
Joe Philllips
A: 

I don't know the syntax for C#, but in Perl it's:

m/(BEGIN|START).*?\)\)/s

The s makes it multiline.

You just have to see how to make the regex multiline in C#

Nathan Fellman
That matches until the last )) in the file, not the first )) after BEGIN or START
Tmdean
A: 

If you don't understand, and want to learn, regexps then let me recommend this regexp site

The solution is probably something like /(START|BEGIN).*))/

mpeterson
+4  A: 

Try this:

(?:START|BEGIN)(?:[^)]+|\)[^)])+\)\)

To explain it:

  • (?:START|BEGIN)   Start with either START or BEGIN.
  • (?:[^)]+|\)[^)])+    After that either any character other than a ) ([^)]+) or a ) that is followed by any character other than ) (\)[^)]) may follow. (So there is no way to match )) with this expression.)
  • \)\)   Finally the )).

I hope this will reduce backtracking.

Gumbo
+1  A: 

Actually you need to account for the VAL( ... )'s as well.

In perl it would be:

(BEGIN|START)\([^(\)\)\))].+\)\)\)
  1. Starts with BEGIN or START
  2. Has a opening bracket
  3. Allows anything NOT ))) in between, to avoid greedy matching
  4. Ends with three closing brackets )))
Jukka Dahlbom
Not if the ending "))" is always on a line by itself. Anyway, your regex doesn't work. It looks like you're trying to use a character class as if it were a negative lookahead.
Alan Moore
Good call - even though this regex will match the given text - at least in a quick php-test using preg_ , the character class does nothing.
Jukka Dahlbom
A: 
ResultString = Regex.Match(subject, @"(START|BEGIN).*?\)\)", RegexOptions.Singleline).Value;
Alekc
I think it should be MultiLine?
Joe Philllips
Mmm probably you are right, autogenerated code from RegexBuddy since i work with php and not c# (regex is right though :D)
Alekc
A: 

Try this

MatchCollection m = Regex.Matches(input, "(START|BEGIN).+?\\)\\)", RegexOptions.Multiline);
PaulB
+1  A: 
@"(?s)(?:START|BEGIN).*?\)\)"

What some of the others are calling "multiline mode" is actually single-line (or DOTALL) mode. That's the mode that lets the dot match newlines. Multiline mode lets '^' match the beginning of a line '$' match the end of a line (normally, the only match the start and end of the whole string. I'm using DOTALL mode with the inline modifier "(?s)".

Alan Moore
Thanks for pointing that multiline/singleline thing out. The names of the options are well choosen; you can enable singleline and multiline mode at the same time ... :D
Daniel Brückner
+1  A: 

You've already got your answer but I thought I'd throw in this link which I find very useful for building/testing expressions. You can quickly test things out and then paste the expression into your code when you're satisfied with it.

lJohnson