tags:

views:

376

answers:

5

I have a string I read from a configuration file. Structure of the string is as follows;

(long_string)long_string(long_string)

Any item in brackets, including the brackets themselves, are optional. I have the following regular expression matching the whole string but I could not figure out how to make some parts of the regular expression optional with "?".

Here are a few valid strings for input

(a)like(1)
like(very long string here)
like

Here is my regexp only matching the first one;

^\((?<short>.*)\)(?<text>.*)\((?<return>.*)\)$

How can I convert my regexp to make brackets optional for a match?

+1  A: 

Surround the two sub-patterns with non-matching groups (?:expr) and make them optional:

^(?:\((?<short>.*)\))?(?<text>.*)(?:\((?<return>.*)\))?$

And if possible make the universal expression .* more specific, maybe with [^()]+:

^(?:\((?<short>[^()]+)\))?(?<text>[^()]+)(?:\((?<return>[^()]+)\))?$
Gumbo
No need to wrap them in non capturing groups. You can make a normal group optional, and it will still count, even if id did not match anything.
Tomalak
I think he doesn’t want to have the parentheses as part of the matches.
Gumbo
@Tomalak: the (?:) is needed to group the uncaptured literal parentheses with their captured contents.
ysth
Thanks a lot. This is exactly what I have been looking for.
And as of now all my unit tests are passing :D
@Gumbo: Good point.
Tomalak
It's the ? quantifier that makes things optional, not the non-capturing parens, (?:).
brian d foy
Yes, the `?` quatifier means zero or one repetitions. The `(?:…)` grouping is just to not get a capturing group that can be referenced like normal groups.
Gumbo
+1  A: 

What I would do is wrap the ( and ) with your grouping members, so instead of

\((?<short>.*)\)

change it to:

(\(<short>.*\))

That way it will match the ()'s along with the inner text. Then, if they are present use another regular expression to eliminate the parentheses.

I'm not very familiar with the named matches syntax so the group syntax might be off but you should get the idea.

whatsisname
A: 

Give this a try...

string[] strings = new string[] { "(a)like(1)", "like(very long string here)", "like" };
foreach (string s in strings)
{
    System.Text.RegularExpressions.Match match = System.Text.RegularExpressions.Regex.Match(s, @"^(\((?<short>.)\))?(?<text>.+)?(\((?<return>.+)\))?$");
    if (match.Success)
    {
        // do logic to handle the match
    }
}
John JJ Curtis
A: 

Well, just make them optional, then:

^(?<short>\(.*\))?(?<text>.*)(?<return>\(.*\))?$

I'm no big fan of named captures, they tend to make it look more complicated than it is (at least for me). Also, I recommend against using ".*". My suggestion:

^(\([^)]*\))?([^(]*)(\([^)]*\))?$

and go for match group 2. But if you insist on using named captures:

^(?<short>\([^)]*\))?(?<text>[^(]*)(?<return>\([^)]*\))?$
Tomalak
A: 

Using the code below, you will always get a @matches array consisting of three elements. If one of the optional parts did not match, the corresponding entry will be undef.

#!/usr/bin/perl

use strict;
use warnings;

my $optional = qr/(?:\(([^)]+?)\))?/;
my $required = qr/([^()]+)/;

while ( my $line = <DATA> ) {
    chomp $line;
    last unless $line =~ /\S/;

    if ( my @matches = ($line =~ /$optional$required$optional/) ) {
        no warnings 'uninitialized';
        print "---$_---\n" for @matches;
    }
}

__DATA__
(a)like(1)
like(very long string here)
like
Sinan Ünür