views:

117

answers:

3

So I'm trying to a parse a file that has text in this format:

outerkey = (innerkey = innervalue)

It gets more complex. This is also legal in the file:

outerkey = (innerkey = (twodeepkey = twodeepvalue)(twodeepkey2 = twodeepvalue2))

So I want to basically capture only the outerkey's text. I cannot guarantee that all of the text will be on one line. It is possible that the value be on multiple lines. And there is more than one item in the file.

So here's my regex so far:

[^\s=]+\s*=\s*(\(\s*.*\s*\))

The goal is for me to simply replace the first part [^\s=]+ with the key I want to search on and I get the entire text of the outer parenthesis.

Here's the problem. My regex will not only capture the text I want to capture, but it will also capture the text from the next group since regex's are greedy. Making it not greedy would not work either since it will stop capturing at the first closing parenthesis.

Ultimately, if I have the following string

foo = 
(
  ifoo = ifoov
)

bar =
(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)

I want the groups to match

(
  ifoo = ifoov
)

and

(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)

Right now it will match

(
  ifoo = ifoov
)

bar =
(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)

By the way, I am running this in multiline and singleline mode.

Any ideas? Thanks!

+3  A: 

I was able to adapt the balancing group definition .NET regex feature for this problem as follows:

Regex r = new Regex(@"(?x) # for sanity!

    (?'Key' [^=\s]* )
    \s*=\s*
    (?'Value'
      (
         (
           [^()]*
           (?'Open'\()
         )+
         (
           [^()]*
           (?'Close-Open'\))
         )+
      )+?
    )
    (?(Open)(?!))

");

We can then test it as follows:

var text = @"
foo = 
(
  ifoo = ifoov
)

bar =
(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)

outerkey = (innerkey = (twodeepkey = twodeepvalue)(twodeepkey2 = twodeepvalue2))
";

foreach (Match m in r.Matches(text)) {
  Console.WriteLine("Key: [{0}]", m.Groups["Key"]);
  Console.WriteLine("Value: [{0}]", m.Groups["Value"]);
  Console.WriteLine("-------");
}
Console.WriteLine("That's all folks!");

This prints (as seen on ideone.com):

Key: [foo]
Value: [(
  ifoo = ifoov
)]
-------
Key: [bar]
Value: [(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)]
-------
Key: [outerkey]
Value: [(innerkey = (twodeepkey = twodeepvalue)(twodeepkey2 = twodeepvalue2))]
-------
That's all folks!

Some minor modifications from the example pattern from the documentation are:

  • The open - close - neither brackets are now \( - \) - [^()] instead of < - > - [^<>]
  • The balanced structure is repeated with +? (at least one, but as few as possible) instead of *
  • "content" is matched before, not after the parentheses
polygenelubricants
I'm not worried about the first part. I understand that I'm trying to match everything that's not a space or equals sign.
Jason Thompson
+1 for the comment about balanced parentheses. People often miss that about regular expressions.
Darron
The reason why my regex got screwed up is because of Stack Overflow's XSS filter.
Jason Thompson
@Jason: OK I was having internet troubles but I think I got the pattern you want. Check it out.
polygenelubricants
+2  A: 

Generally speaking, regexp cannot count matches, so this not easy to accomplish. .NET, however, has a feature called 'balancing group definitions' The example here shows how to match paired angle brackets and should get you there...

Scott Evernden
Rock on! That's exactly what I needed. My final regex looks like this:^test\s*=\s*(((?<OpenP>\()[^\(\)]*)+((?<Value-OpenP>\))[^\(\)]*)+)*(?(OpenP)(?!))$test is the key I'm searching on. All I need to do is look at the Value group and I've got what I want.
Jason Thompson
*sigh* Stack overflow's XSS filter strikes again! There should be some escaped parenthesis in my comment.
Jason Thompson
A: 

It's not regex, but it's fairly straightforward to accomplish this with a stack. While scanning your text:

  • If you see an opening parenthesis, push it on the stack. Start collecting characters.
  • If you see another opening parenthesis, push it on the stack.
  • If you see a closing parenthesis, pop an element from the stack.
  • If the stack is empty, you reached the balancing closing parenthesis. The characters you collected are the text between outer parentheses.
Corbin March