tags:

views:

94

answers:

3

I'm searching the pattern "(.*)\1" on the text "blabl" with regexec() and get successful match but empty matches in regmatch_t structures. What exactly has been matched?

+5  A: 

The regex .* can match successfully a string of zero characters, or the nothing that occurs between adjacent characters.

So your pattern is matching zero characters in the parens, and then matching zero characters immediately following that.

So if your regex was /f(.*)\1/ it would match the string "foo" between the 'f' and the first 'o'.

You might try using .+ instead of .*, as that matches one or more instead of zero or more. (Using .+ you should match the 'oo' in 'foo')

Mnebuerquo
Actually, /f(.*)\1/ matches 'foo' because the star is greedy. But it will also match just 'f'.
Alan Moore
A: 

\1 is the backreference typically used for replacement later or when trying to further refine your regex by getting a match within a match. You should just use (.*), this will give you the results you want and will automatically be given the backreference number 1. I'm no regex expert but these are my thoughts based on my limited knowledge.

As an aside, I always revert back to RegexBuddy when trying to see what's really happening.

Douglas Anderson
The intent may have been to match a string that appeared twice in a row in the text. For that the \1 would work.
Mnebuerquo
That makes sense. My limited knowledge of Regex shows through once again!
Douglas Anderson
A: 

\1 is the "re-match" instruction. The question is, do you want to re-match immediately (e.g., BLABLA)

/(.+)\1/

or later (e.g., BLAahem**BLA**)

/(.+).*\1/
harpo