tags:

views:

4636

answers:

8

I need a regular expression to select all the text between two outer brackets.

Example: some text(text here(possible text)text(possible text(more text)))end text

Result: (text here(possible text)text(possible text(more text)))

I've been trying for hours, mind you my regular expression knowledge isn't what I'd like it to be :-) so any help will be gratefully received.

+4  A: 
[^\(]*(\(.*\))[^\)]*

[^\(]* matches everything that isn't an opening bracket at the beginning of the string, (\(.*\)) captures the required substring enclosed in brackets, and [^\)]* matches everything that isn't a closing bracket at the end of the string. Note that this expression does not attempt to match brackets; a simple parser (see dehmann's answer) would be more suitable for that.

Zach Scrivena
the bracket inside the class does not need to be escaped. Since inside it is not a metacharacted.
José Leal
Why not only: \(.+\)
some
This expr fails against something like "text(text)text(text)text" returning "(text)text(text)". Regular expressions can't count brackets.
SealedSun
@José: You are right of course. But for consistency, I just escape them anyway =)
Zach Scrivena
@some: That works too. The longer version matches the whole string explicitly though.
Zach Scrivena
@SealedSun: That's right. I've updated my answer with a disclaimer.
Zach Scrivena
A: 
(?<=\().*(?=\))

If you want to select text between two matching parentheses, you are out of luck with regular expressions. This is impossible.

This regex just returns the text between the fist opening and the last closing parentheses in your string.

Tomalak
What do the "<=" and "=" signs mean? What regexp engine is this expression targeting?
SealedSun
This is look-around, or more correctly "zero width look-ahead/look-behind assertions". Most modern regex engines support them.
Tomalak
According to the OP's example, he wants to include the outermost parens in the match. This regex throws them away.
Alan Moore
@Alan M: You are right. But according to the question text, he wants everything _between_ the outermost parens. Pick your choice. He said he'd been trying for hours, so didn't even consider "everything including the outermost parens" as the intention, because it is so trivial: "\(.*\)".
Tomalak
+12  A: 

Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.

But there is a simple algorithm to do this, which I described in this answer to a previous question.

I was toying with this idea but thought I might be able to do it with RegExp. Will go back to my original plan. Thanks everyone
DaveF
.NET's implementation has [Balancing Group Definitions http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition] which allow this sort of thing.
DGGenuine
A: 

try this:

(?<=(?:[^(]|^))(.*)(?=[^)]|$)

group 1

José Leal
+1  A: 

The answer depends on whether you need to match matching sets of brackets, or merely the first open to the last close in the input text.

If you need to match matching nested brackets, then you need something more than regular expressions. - see @dehmann

If it's just first open to last close see @Zach

Decide what you want to happen with:

abc ( 123 ( foobar ) def ) xyz ) ghij

You need to decide what your code needs to match in this case.

Douglas Leeder
A: 

I needed to get the innermost brackets... To evaluate them and substitute their results...

((?<=(?:[(]))[^(].*?(?=[)]))

Posting it here so that I an retrive it later if needed...

Sorry that was half..here's the one which captures empty brackets also... easier to replace values and removing brackets at the same time...\((?:(?<=(?:[(]))[^(].*?(?=[)]))\)|\(\)Best regards,DD
A: 

Check balancing groups, they are done for this.

Nicolas Dorier
A: 

Here is exactly how you do it:

Somewhere on the top:

using System.Text.RegularExpressions;

Then:

string Pattern = @"\((.*)\)";
Match m = Regex.Match(Input, Pattern);
Console.WriteLine("(" + m.Index + ", " + m.Length + ")" + " ---> " + m.Value);

This will match first opening with the last closing brackets.

So this will take something like "hello(sometext(some more) after text) and more) and some" will return the bold part.
A simple count of how many opening and closing brackets you have puts you one step closer to knowing if they match...