ansaurus

Question

Keeping track of punctuation, spacing, when editing a file in Java

Answer 1

A:

How about something like this? In this case, I assume it is case insensitive.

    Pattern p = Pattern.compile("(\\w+) \\1");
    String line = "Hello hello this is a test test in order\norder to see if it deletes duplicates Duplicates words.";

    Matcher m = p.matcher(line.toUpperCase());

    StringBuilder sb = new StringBuilder(1000);
    int idx = 0;

    while (m.find()) {
        sb.append(line.substring(idx, m.end(1)));
        idx = m.end();
    }
    sb.append(line.substring(idx));

    System.out.println(sb.toString());

Here's the output:-

Hello this a test in order
order to see if it deletes duplicates words.

limc 2010-08-03 16:46:44

Can you explain your code more, starting with the sb.append part. I'm not sure how it works exactly. Thx.

Crystal 2010-08-04 04:34:53

The "1" in m.end(1) represents the group in the regex (surrounded by parentheses). m.end(1) returns the last index of that matching group while m.end() returns the last index of the entire string that matches the provided pattern ("(\\w+) \\1"). Basically, I'm ignoring anything between m.end(1) and m.end() because it is the duplicate of the string between m.start(1) and m.end(1). I don't use m.start(1) in this case because I don't see a need to. Hope this helps.

limc 2010-08-04 14:10:40

ansaurus

tags:

views:

answers:

Keeping track of punctuation, spacing, when editing a file in Java

related questions