tags:

views:

25

answers:

3

I have a string that looks like:

ABC-DEF01-GHI54677-JKL!9988-MNOP

Between each - can be virtually any character repeated any number of times.

I'm using this regular expression:

[^-]*

How do I make it 'match' the match at the 2nd index (e.g. DEF01)? Or the 3rd (GHI54677) or 4th (JKL!9988)?

The engine I'm using doesn't let me specify a match index or additional code - it has to all be done within the expression.

+2  A: 

The second set of parens will capture "DEF", "GHI", and "JKL", respectively...

([^-]+-){1}([^-]+)
([^-]+-){2}([^-]+)
([^-]+-){3}([^-]+)

If this is perl, make the first set of parens non-capturing, i.e.:

# perl -de 0
$_="ABC-DEF-GHI-JKL-MNO"
p /(?:[^-]+-){1}([^-]+)/
  DEF
p /(?:[^-]+-){2}([^-]+)/
  GHI
p /(?:[^-]+-){3}([^-]+)/
  JKL

$_="ABC-DEF01-GHI54677-JKL!9988-MNOP"
p /(?:[^-]+-){1}([^-]+)/
  DEF01
p /(?:[^-]+-){2}([^-]+)/
  GHI54677
p /(?:[^-]+-){3}([^-]+)/
  JKL!9988

Explanation:

(?:  = non-capturing parens
[^-] = a non-dash character
+    = one or more
-    = a dash
)    = close paren
{3}  = repeat 3 times

This part "gobbles up" 1, 2, 3, or any number you like, of the blocks, leaving the next set to take the one you're looking for.

In lieu of +, you can also use {1,} meaning 1-to-any-number.

If your blocks can be zero size, so:

ABC--GHI-JKL

And you want to find the second, which is "" (empty string), then use * instead of +. Or you can use {0,}, meaning 0-to-any-number.

eruciform
This answered my question but then I realised I'd written it incorrectly. My apologies, it's now been clarified.
Alex Angas
no problem, just change `{3}` to `+` or `{1,}` if you mean 1-or-more-non-dash-things, or `*` or `{0,}` if you mean zero-or-more
eruciform
answer updated...
eruciform
+1  A: 

You didn't specify what language/regular expression engine you're using, but some (most?) let you keep applying a match over and over to the same string iteratively. For example, pcrecpp lets you do:

pcrecpp::StringPiece piece("ABC-DEF-GHI-JKL-MNO");
pcrecpp::RE re("([^-]+)-?");
unsigned int index = 3; // e.g., for GHI

std::string group;
for(unsigned int i = 0; i < index; i++)
    re.Consume(&piece, &group);

// group now contains "GHI". Calling Consume again would give it JKL
Michael Mrozek
Thanks, unfortunately I can't use any additional code - it all has to be done within the expression. I've now clarified the question.
Alex Angas
A: 

Different answer based on your revision: Do you just want this?

(?:[^-]+-){index-1}([^-]+)

The non-capturing group matches index-1 of the subblocks, so for index=3 it matches ABC-DEF01-, and then the capturing group matches GHI54677

Michael Mrozek