tags:

views:

62

answers:

3

I'm A Regular Expression Newbie, And I Can't Quite Figure Out How To Write A Single Regular Expression That Would "Match" Any Duplicate Consecutive Words Such As:

Paris in the the spring.

Not that that is related.

Why are you laughing? Are my my regular expressions THAT bad??

Is There A Single Regular Expression That Will Match ALL Of The Bold Strings Above?

Thanks In Advance!

+2  A: 

No. That is an irregular grammar. There may be engine-/language-specific regular expressions that you can use, but there is no universal regular expression that can do that.

Ignacio Vazquez-Abrams
Though being correct in a strict sense, I believe there is no regex engine in serious use anymore that does not support grouping and back-references.
Tomalak
+4  A: 

Try this regular expression:

\b(\w+)\s+\1\b

Here \b is a word boundary and \1 references the captured match of the first group.

Gumbo
Makes me wonder; is it possible to do `\0` too? (Where `\0` is the whole regex, up to the current point OR where `\0` refers to the whole regex)
Pindatjuh
@Pindatjuh: No, I don’t think so because that sub-match would also be part of the whole match.
Gumbo
+2  A: 

The widely-used PCRE library can handle such situations (you won't achieve the the same with POSIX-compliant regex engines, though):

(\b\w+\b)\W+\1
soulmerge
You need something to match the characters *between* the two words, like `\W+`. `\b` won't do it, because it doesn't consume any characters.
Alan Moore
Many thanks, fixed.
soulmerge