tags:

views:

856

answers:

2

What I want to do is check for duplicated words right next to each other but even if there is punctuation in between.

For example:

Vivamus Vivamus diam, diam, Vivamus Vivamus diam, diam Vivamus

there should be 4 distinct hits here.

I can't figure out why this isn't working; can someone explain why and show me what the correct code should be?

thanks.

(\w*(?:[ ,\.])*?)\1


PS: due to the confusion it causes, I'm not going to say that I'm using the Perl engine.

+4  A: 

The (?: is a non-capturing parenthesis, meaning it won't store the matches. You will need to use capturing parenthesis.

(\w+)\W+\1
tj111
NNNEEeeeeAAAAAAAHhhhhhhhhh.....NOT WORD!!!! THAT'S IT!THANKS!!!
Keng
A: 

The original expression doesn't create a separate capture for the punctuation, but does include the captured punctuation in the first capture. That means it would spot things like:

diam, diam, really, really, twice.

But you aren't really interested in the punctuation, so tj111's solution works properly, even though the '(?: ) is a non-capturing parenthesis' explanation is somewhat ... incomplete? The comment quoted is accurate, but it isn't why the overall regex failed.

Jonathan Leffler