ansaurus

Question

How do I choose a regular expression match at an arbitrary index?

Answer 1

+2 A:

The second set of parens will capture "DEF", "GHI", and "JKL", respectively...

([^-]+-){1}([^-]+)
([^-]+-){2}([^-]+)
([^-]+-){3}([^-]+)

If this is perl, make the first set of parens non-capturing, i.e.:

# perl -de 0
$_="ABC-DEF-GHI-JKL-MNO"
p /(?:[^-]+-){1}([^-]+)/
  DEF
p /(?:[^-]+-){2}([^-]+)/
  GHI
p /(?:[^-]+-){3}([^-]+)/
  JKL

$_="ABC-DEF01-GHI54677-JKL!9988-MNOP"
p /(?:[^-]+-){1}([^-]+)/
  DEF01
p /(?:[^-]+-){2}([^-]+)/
  GHI54677
p /(?:[^-]+-){3}([^-]+)/
  JKL!9988

Explanation:

(?:  = non-capturing parens
[^-] = a non-dash character
+    = one or more
-    = a dash
)    = close paren
{3}  = repeat 3 times

This part "gobbles up" 1, 2, 3, or any number you like, of the blocks, leaving the next set to take the one you're looking for.

In lieu of +, you can also use {1,} meaning 1-to-any-number.

If your blocks can be zero size, so:

ABC--GHI-JKL

And you want to find the second, which is "" (empty string), then use * instead of +. Or you can use {0,}, meaning 0-to-any-number.

eruciform 2010-07-19 03:51:47

This answered my question but then I realised I'd written it incorrectly. My apologies, it's now been clarified.

Alex Angas 2010-07-19 03:57:32

no problem, just change `{3}` to `+` or `{1,}` if you mean 1-or-more-non-dash-things, or `*` or `{0,}` if you mean zero-or-more

eruciform 2010-07-19 04:02:38

answer updated...

eruciform 2010-07-19 04:19:04

Answer 2

+1 A:

You didn't specify what language/regular expression engine you're using, but some (most?) let you keep applying a match over and over to the same string iteratively. For example, pcrecpp lets you do:

pcrecpp::StringPiece piece("ABC-DEF-GHI-JKL-MNO");
pcrecpp::RE re("([^-]+)-?");
unsigned int index = 3; // e.g., for GHI

std::string group;
for(unsigned int i = 0; i < index; i++)
    re.Consume(&piece, &group);

// group now contains "GHI". Calling Consume again would give it JKL

Michael Mrozek 2010-07-19 03:53:37

Thanks, unfortunately I can't use any additional code - it all has to be done within the expression. I've now clarified the question.

Alex Angas 2010-07-19 03:58:33

Answer 3

A:

Different answer based on your revision: Do you just want this?

(?:[^-]+-){index-1}([^-]+)

The non-capturing group matches index-1 of the subblocks, so for index=3 it matches ABC-DEF01-, and then the capturing group matches GHI54677

Michael Mrozek 2010-07-19 04:00:51

ansaurus

tags:

views:

answers:

How do I choose a regular expression match at an arbitrary index?

related questions