tags:

views:

93

answers:

4

I want to find doubled-word(s) in a text, i used (\w+) +\1 it works, but however it only finds "abc abc" in the text.

i also want to find "abc def abc def"

thanks,..

+1  A: 

Not sure what you want it to match but it could be as simple as changing it to:

(\w+) +.*\1

the .* will match any extra characters which might be in between.

This will match the 'abc def abc' part of 'abc def abc def', If you want to match it all change it to:

(\w+) +.*\1.*

Salgar
thanks for your answer but it didnt work.now, i tried "((\w| )+) +\1" it works!! but it also finds " " (spaces more than 3)
WhoSayIn
+1  A: 

"(\w.*) +\1" maybe? or does this get too general for your needs?

"(\w+(?:\s+\w+)*) +\1" might work as well.

gnarf
Thanks!! it works excellent as i want!!
WhoSayIn
+3  A: 

The following regex will match any repeated sequence of characters:

/(.+).*?\1/

If you only want repeated sequences that have nothing but whitespace in between, then use this instead:

/(.+)\s+?\1/

If you only want words separated by whitespace, change the (.+) to a (\w+):

/(\w+)\s+?\1/

If you want to look at words ignoring things like punctuation, word borders might be more useful:

/(\b\w+?\b)\.+?\b\1\b/
Amber
.+?, so we don't match ie. "wikiwiki"
Matthew Scharley
I think he wanted words, this will even match two white spaces in the same set of characters.
Salgar
Depends on which behavior the question-asker desires, but I'll add it as an option.
Amber
+1 for mentioning \b
gnarf
+1  A: 

are you trying to delete the duplicates? or you can also check this answer

pageman