ansaurus

Question

Answer 1

+1 A:

The first match is "ThIS IS an example...", so m.end() points to the end of the second "is". I'm not sure why you use i for the start index; try m.start() instead.

To improve your regex, use \b before and after the word to indicate that there should be word boundaries: (\\b\\w+\\b). Otherwise, as you're seeing, you'll get matches inside of words.

John Kugelman 2010-08-04 04:51:31

Answer 2

+3 A:

Try something like:

s = s.replaceAll("\\b(\\w+)\\b(\\s+\\1)+\\b", "$1");

That regex is a bit stronger than yours - it checks for whole words (no partial matches), and gets rid of any number of consecutive repetitions.
The regex captures a first word: \b(\w+)\b, and then attempts to match spaces and repetitions of that word: (\s+\1)+. The final \b is to avoid partial matching of \1, as in "for formatting".

Kobi 2010-08-04 04:52:57

That helped out a lot. Is there a way to check for things that are different case? Like "test Test"?

Crystal 2010-08-05 04:03:27

@Crystal - Thanks! You can add `(?i)` at the beginning of the regex to make it case-insensitive, it seems like the standard solution for `replaceAll`.

Kobi 2010-08-05 04:16:50

Another question Kobi if you have a second, if I am looping through an Arraylist that has my lines of words from a test file, and if I did a foreach loop to go through it, like for (String s: lineOfWords) { s = s.replaceAll..., then how would I add this new "s" to my new ArrayList to return. I think it has to do with shallow vs deep copy, but not sure. I tried pseudo-coding in my initial question above. Thx!

Crystal 2010-08-06 01:09:14

ansaurus

tags:

views:

answers:

Pattern, matcher in Java, REGEX help

related questions