tags:

views:

87

answers:

2

Hello, I was wondering if i can get some help with this problem. Suppose I had a string

34342

I would like to find the number of pairs in this string, which would be two. How would i go about doing that?


EDIT: Ok what i really wanted was to match the occurrences of characters that are the same in the string.

+8  A: 

You can use backreferences to find pairs of things that appear in a row:

(\d+)\1

This will match one or more digit character followed by the same sequence again. \1 is a backreference which refers to the contents of the first capturing group.


If you want to match numbers that appear multiple times in the string, you could use a pattern like

(\d)(?=\d*\1)

Again we're using a backreference, but this time we also use a lookahead as well. A lookahead is a zero-width assertion which specifies something that must be matched (or not matched, if using a negative lookahead) after the current position in the string, but doesn't consume any characters or move the position the regex engine is at in the string. In this case, we will assert that the contents of the first capture group must be found again, though not necessarily directly beside the first one. By specifying \d* within the lookahead, it will only be considered a pair if it is within the same number (so if there's a space between numbers, the pair won't be matched -- if this is undesired, the \d can be changed to ., which will match any character).

It'll match the first 3 and 4 in 34342 and the first 1, 2, 3, and 4 in 12332144. Note however that if you have an odd number of repetitions, you will get an extra match (ie. 1112 will match the first two 1s), because lookaheads do not consume.

Daniel Vandersluis
+1 I think this is a good solution using `regex`. However, I don't think `regex` is the ideal technology for this problem.
linuxuser27
@linuxuser27 agreed. The question was about regex so I tried to come up with a regex solution, but it'd be better done using string functions.
Daniel Vandersluis
No I think it is pretty clever. I have never used the `lookahead` on regex so I learned something today. Thanks!
linuxuser27
+1  A: 

Here's one way, if a regex doesn't seem appropriate. One method here uses a map, the other uses pure arrays. I don't really know what a pair is. Is "555" three pairs, one pair, or what? So these routines print a list of all characters that occur more than once.

public class Pairs {

    public static void main(String[] args) {
        usingMap("now is the time for all good men");
        System.out.println("-----------");
        usingArrays("now is the time for all good men");
    }

    private static void usingMap(String s) {
        Map<Character, Integer> m = new TreeMap<Character, Integer>();

        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (m.containsKey(c)) {
                m.put(c, m.get(c) + 1);
            } else {
                m.put(c, 1);
            }
        }
        for (Character c : m.keySet()) {
            if (m.get(c) > 1) {
                System.out.println(c + ":" + m.get(c));
            }
        }
    }

    private static void usingArrays(String s) {
        int count[] = new int[256];
        for (int i = 0; i < count.length; i++) count[i] = 0;

        for (int i = 0; i < s.length(); i++) {
            count[s.charAt(i)]++;
        }
        for (int i = 0; i < count.length; i++) {
            if (count[i] > 1) {
                System.out.println((char) i + ":" + count[i]);
            }
        }
    }
}
Tony Ennis