tags:

views:

85

answers:

5

Is there a way using a regex to match a repeating set of characters? For example:

ABCABCABCABCABC

ABC{5}

I know that's wrong. But is there anything to match that effect?

Update:

Can you use nested capture groups? So Something like (?<cap>(ABC){5}) ?

+8  A: 

Enclose the regex you want to repeat in parentheses. For instance, if you want 5 repetitions of ABC:

(ABC){5}

Or if you want any number of repetitions (0 or more):

(ABC)*

Or one or more repetitions:

(ABC)+

edit to respond to update

Parentheses in regular expressions do two things; they group together a sequence of items in a regular expression, so that you can apply an operator to an entire sequence instead of just the last item, and they capture the contents of that group so you can extract the substring that was matched by that subexpression in the regex.

You can nest parentheses; they are counted from the first opening paren. For instance:

>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(0)
'123 ABCDEF'
>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(1)
'ABCDEF'
>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(2)
'DEF'

If you would like to avoid capturing when you are grouping, you can use (?:. This can be helpful if you don't want parentheses that you're just using to group together a sequence for the purpose of applying an operator to change the numbering of your matches. It is also faster.

>>> re.search('[0-9]* (?:ABC(...))', '123 ABCDEF 456').group(1)
'DEF'

So to answer your update, yes, you can use nested capture groups, or even avoid capturing with the inner group at all:

>>> re.search('((?:ABC){5})(DEF)', 'ABCABCABCABCABCDEF').group(1)
'ABCABCABCABCABC'
>>> re.search('((?:ABC){5})(DEF)', 'ABCABCABCABCABCDEF').group(2)
'DEF'
Brian Campbell
I would use + here instead of *, because * will match 0 occurrences of (ABC).
Robusto
(ABC){3,5} also for a range of repetitions
Yanick Rochon
Oh duh. I was thinking ( ) is only used for that capture stuff for some reason. But that makes sense. Chosen for being first
Falmarri
@Falmari `()` are used both for grouping and for capture. If you want to do grouping without capture (this would be useful if you already have other captures and don't want to change their numbering), you can use `(?:ABC)`.
Brian Campbell
I updated my question. Is that how to do what my update wants to do?
Falmarri
+1 ..don't forget (ABC|XYZ){5} for various groups of characters.
John Isaacks
@Brian. That's what I would do in this case since capturing isn't needed and the `(?:non capturing)` groups are actually faster.
Steve Wortham
@Falmarri I've updated my answer to respond to your updated question. I wasn't exactly sure what you were asking; I hope that answers it, though.
Brian Campbell
Yeah that's what I was asking, thanks
Falmarri
+3  A: 

(ABC){5} Should work for you

Novikov
+1  A: 

Parentheses "()" are used to group characters and expressions within larger, more complex regular expressions. Quantifiers that immediately follow the group apply to the whole group.

(ABC){5}
pyfunc
+2  A: 

ABC{5} matches ABCCCCC. To match 5 ABC's, you should use (ABC){5}. Parentheses are used to group a set of characters. You can also set an interval for occurrences like (ABC){3,5} which matches ABCABCABC, ABCABCABCABC, and ABCABCABCABCABC.

(ABC){1,} means 1 or more repetition which is exactly the same as (ABC)+.

(ABC){0,} means 0 or more repetition which is exactly the same as (ABC)*.

Zafer
A: 

As to the update to the question-

You can nest capture groups. The capture group index is incremented per open paren.

(((ABC)*)(DEF)*)

Feeding that regex ABCABCABCDEFDEFDEF, capture group 0 matches the whole thing, 1 is also the whole thing, 2 is ABCABCABC, 3 is ABC, and 4 is DEF (because the star is outside of the capture group).

If you have variation inside a capture group and a repeat just outside, then things can get a little wonky if you're not expecting it...

(a[bc]*c)*

when fed abbbcccabbc will return the last match as capture group 1, in this example just the abbc, since the capture group gets reset with the repeat operator.

dash-tom-bang