tags:

views:

80

answers:

3

I have a list of strings. Some of them are of the form 123-...456. The variable portion "..." may be:

  • the string "apple" followed by a hyphen, e.g. 123-apple-456
  • the string "banana" followed by a hyphen, e.g. 123-banana-456
  • a blank string, e.g. 123-456 (note there's only one hyphen)

Any word other than "apple" or "banana" is invalid.

For these three cases, I would like to match "apple", "banana", and "", respectively. Note that I never want capture the hyphen, but I always want to match it. If the string is not of the form 123-...456 as described above, then there is no match at all.

How do I write a regular expression to do this? Assume I have a flavor that allows lookahead, lookbehind, lookaround, and non-capturing groups.


The key observation here is that when you have either "apple" or "banana", you must also have the trailing hyphen, but you don't want to match it. And when you're matching the blank string, you must not have the trailing hyphen. A regex that encapsulates this assertion will be the right one, I think.

A: 

Try this:

/\d{3}-(?:(apple|banana)-)?\d{3}/
slosd
This is not correct since it matches, for example, "123-coconut-456".
David Stone
@david: how's that different from your "banana" example?
SilentGhost
@SilentGhost: I *only* want to capture `apple` or `banana` or "". All other values are invalid, as I stated.
David Stone
sry, in that case: **/\d{3}-(?:(apple|banana)-)?\d{3}/**
slosd
A: 

Try:

123-(?:(apple|banana|)-|)456

That will match apple, banana, or a blank string, and following it there will be a 0 or 1 hyphens. I was wrong about not having a need for a capturing group. Silly me.

Thomas
This is not correct since it matches, for example, "123-coconut-456".
David Stone
Thought you wanted it more general...fixed.
Thomas
this will match `'123--456'`
SilentGhost
Woops...fixed that as well.
Thomas
+3  A: 

The only way not to capture something is using look-around assertions:

(?<=123-)((apple|banana)(?=-456)|(?=456))

Because even with non-capturing groups (?:…) the whole regular expression captures their matched contents. But this regular expression matches only apple or banana if it’s preceded by 123- and followed by -456, or it matches the empty string if it’s preceded by 123- and followed by 456.

Gumbo
+1 — In this case, you can work around that by using group 1 rather than group 0, but this is an excellent (and subtle!) distinction.
Ben Blank
@Ben Blank: It definitely depends on how “match” and “capture” are interpreted.
Gumbo