tags:

views:

2574

answers:

4

Hi all, I looked around for a while, but probably I can't "Google" with the proper keywords, so I'm asking here. I need to extract from a string a set of characters which are included between two delimiters, without returning the delimiters themselves. I'm trying to do it with C# RegEx object.

A simple example should be helpful:

Target: extract the substring between square brackets, without returning the brackets themselves.

Base string: This is a test string [more or less]

If I use the following reg. ex.

\[.*?\]

The match is "[more or less]". I need to get only "more or less", without the brackets. Is it possible to do it? Many thanks.

+3  A: 

Easy done:

(?<=\[)(.*?)(?=\])

Technically that's using lookaheads and lookbehinds. See Lookahead and Lookbehind Zero-Width Assertions. The pattern consists of:

  • is preceded by a [ that is not captured (lookbehind);
  • a non-greedy captured group. It's non-greedy to stop at the first ]; and
  • is followed by a ] that is not captured (lookahead).

Alternatively you can just capture what's between the square brackets:

\[(.*?)\]

and return the first captured group instead of the entire match.

cletus
"Easy done", LOL! :)Regular expressions always give me headache, I tend to forget them as soon as I find the ones that solve my problems.About your solutions: the first works as expected, the second doesn't, it keeps including the brackets. I'm using C#, maybe the RegEx object has its own "flavour" of regex engine...
Diego
It's doing that because you're looking at the whole match rather than the first matched group.
cletus
Take a look at http://en.csharp-online.net/CSharp_Regular_Expression_Recipes%E2%80%94Extracting_Groups_from_a_MatchCollection
cletus
Many thanks, very useful website! I'll keep it as a reference. :)Sorry if I made some confusion, C# development is not really one of my skills..
Diego
+3  A: 

You just need to 'capture' the bit between the brackets.

\[(.*?)\]

To capture you put it inside parentheses. You do not say which language this is using. In Perl for example, you would access this using the $1 variable.

my $string ='This is the match [more or less]';
$string =~ /\[(.*?)\]/;
print "match:$1\n";

Other languages will have different mechanisms. C#, for example, uses the Match collection class, I believe.

Xetius
Thanks, but this solution didn't work, it keeps including the square brackets. As I wrote in my comment to Cletus' solution, it could be that C# RegEx object interprets it differently. I'm not expert on C# though, so it's just a conjecture, maybe it's just my lack of knowledge. :)
Diego
A: 

PHP:

$string ='This is the match [more or less]';
preg_match('#\[(.*)\]#', $match);
var_dump($match[1]);
powtac
A: 

I use the following site to test my Regular Expressions:

http://www.gskinner.com/RegExr/

Have you tried the following:

([A-Za-z])