tags:

views:

39

answers:

1

Say I have the following pattern:

INDICATOR\s+([a-z0-9]+)

which would match for example:

INDICATOR AA or INDICATOR B3

I need to edit this pattern so it matches any instances of a string which starts with INDICATOR has a space and then has multiple matches of the inner pattern e.g.

INDICATOR AA A3 66 B8 34 CD
INDICATOR BG 4D CS
INDICATOR HG

Is it possible to do this?

Solution

With thanks to Gumbo I came up with the following regex which suits my requirements:

INDICATOR((\s+)?([,-])?(\s+)?([a-z0-9]+))+

+2  A: 

Try this:

INDICATOR(\s+([a-z0-9]+))+

Here the repeating pattern is wrapped in a group and quantified using + to allow one or more repetitions of the expression inside the group. But you won’t get every match of the inner group with this but only the last match (or to be more specific: it depends on the implementation you’re using).

Gumbo
@Gumbo: Basically I just need to match the full string and then I will reparse it using only the inner regex to get the data out. This initial regex is for validation.
James
@James: Then I suggest using so called non-capturing groups `(?:…)` instead of “normal” groups to avoid the costs of storing the captured string.
Gumbo
@Gumbo: Does this give me the advantage of not having to re-parse the string I can just iterate over the groups?
James
No you will always have to re-parse the group if you want the individual values. Regex cannot recursively capture values in that matter. If you use a quantifier on a capture group, the group only contains the last value matched.
Cags
Say the pattern might have a different separator instead of a space i.e. a comma or a hyphen. Would I just replace the `\s+` with something like `[(\s+),-]` ?
James
@James: If you would allow one or more spaces but only one single comma/hyphen, you would need `(\s+|[,-])`. Otherwise, if it is always just one character, `[\s,-]`.
Gumbo
@Cags: There are implementations that store each single match of a repetition (I think [.NET’s CaptureCollection](http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.capturecollection\(v=VS.71\).aspx) provides such a behavior).
Gumbo
@Gumbo , ah true, for some reason I'd got it in my head he was using PHP.
Cags
James
I ended up with the following `INDICATOR((\s+)?([,-])?(\s+)?([a-z0-9]+))+`. This would allow all variations of `INDICATOR AA, BB`, `INDICATOR AA,BB`, `INDICATOR AABB`, `INDICATOR AA , BB`, `INDICATOR AA BB` which keeps my pattern flexible. I need to try support all these different situations as I can't guarentee how the users will send the information in a strict format.
James
@Gumbo: Just one more thing (not really a necessity but would be nice). Is it possible to restrict the amount of patterns I match? i.e instead of matching all, can I restrict it to only match the first 20?
James
@James: Yes, you can use the quantifier `{n,m}` to allow n to m repetitions. So try `INDICATOR(\s*[,-\s]\s*[a-z0-9]+){1,20}`.
Gumbo
@Gumbo: Excellent, thanks for your time!
James