I want to find doubled-word(s) in a text, i used (\w+) +\1
it works, but however it only finds "abc abc" in the text.
i also want to find "abc def abc def"
thanks,..
I want to find doubled-word(s) in a text, i used (\w+) +\1
it works, but however it only finds "abc abc" in the text.
i also want to find "abc def abc def"
thanks,..
Not sure what you want it to match but it could be as simple as changing it to:
(\w+) +.*\1
the .*
will match any extra characters which might be in between.
This will match the 'abc def abc' part of 'abc def abc def', If you want to match it all change it to:
(\w+) +.*\1.*
"(\w.*) +\1
" maybe? or does this get too general for your needs?
"(\w+(?:\s+\w+)*) +\1
" might work as well.
The following regex will match any repeated sequence of characters:
/(.+).*?\1/
If you only want repeated sequences that have nothing but whitespace in between, then use this instead:
/(.+)\s+?\1/
If you only want words separated by whitespace, change the (.+)
to a (\w+)
:
/(\w+)\s+?\1/
If you want to look at words ignoring things like punctuation, word borders might be more useful:
/(\b\w+?\b)\.+?\b\1\b/
are you trying to delete the duplicates? or you can also check this answer