tags:

views:

136

answers:

3

I want to split a string like this:

abc//def//ghi

into a part before and after the first occurrence of //:

a: abc
b: //def//ghi

I'm currently using this regex:

(?<a>.*?)(?<b>//.*)

Which works fine so far.

However, sometimes the // is missing in the source string and obviously the regex fails to match. How is it possible to make the second group optional?

An input like abc should be matched to:

a: abc
b: (empty)

I tried (?<a>.*?)(?<b>//.*)? but that left me with lots of NULL results in Expresso so I guess it's the wrong idea.

+5  A: 

Try a ^ at the begining of your expression to match the begining of the string and a $ at the end to match the end of the string (this will make the ungreedy match work).

^(?<a>.*?)(?<b>//.*)?$
Stevo3000
Obviously the second expression in the question (the one with the trailing ? after the second group).
Stevo3000
I get a single NULL result when I try this.
mafutrct
@mafutrct - I hadn't run it through expresso so hadn't noticed the ungreedy match, added a $ to fix it. Works correctly now.
Stevo3000
Great! Thanks to Kamarey as well, who just deleted his correct answer.
mafutrct
`^(?<a>.*?)(?://(?<b>.*))?$` would probably be better. No need to capture `//` .
Brad Gilbert
@Brad Gilbert - That was an expression I considered suggesting, but the example text at the top of the question has multiple // instances. So I left the // in the second group for consistancys sake.
Stevo3000
A: 

A proof of Stevo3000's answer (Python):

import re

test_strings = ['abc//def//ghi', 'abc//def', 'abc']

regex = re.compile("(?P<a>.*?)(?P<b>//.*)?$")

for ts in test_strings:
    match = regex.match(ts)
    print 'a:', match.group('a'), 'b:', match.group('b')

a: abc b: //def//ghi
a: abc b: //def
a: abc b: None
kjfletch
A: 

Why use group matching at all? Why not just split by "//", either as a regex or a plain string?

use strict;

my $str = 'abc//def//ghi';
my $short = 'abc';

print "The first:\n";
my @groups = split(/\/\//, $str, 2);
foreach my $val (@groups) {
print "$val\n";
}

print "The second:\n";
@groups = split(/\/\//, $short, 2);
foreach my $val (@groups) {
print "$val\n";
}

gives

The first:
abc
def//ghi
The second:
abc

[EDIT: Fixed to return max 2 groups]

gnud
All // after the first // are to be ignored.
mafutrct
I didn't catch that.I still think my solution is the easiest to understand, use the limit parameter present in most split functions.
gnud