views:

288

answers:

3

The following code is trying to remove any duplicate characters in a string.Iam not sure if the code is right??Can anybody help me with the working of the code i.e whats actually happening when there is a match in characters?

public static void removeDuplicates(char[] str) {
  if (str == null) return;
  int len = str.length;
  if (len < 2) return;
  int tail = 1;
  for (int i = 1; i < len; ++i) {
    int j;
    for (j = 0; j < tail; ++j) {
      if (str[i] == str[j]) break;
    }
    if (j == tail) {
      str[tail] = str[i];
      ++tail;
    }
  }
  str[tail] = 0;
}
A: 

This would be much easier if you just looped through the array and added all new characters to a list, then retruned that list.

With this approach, you need to reshuffle the array as you step through it and eventually redimension it to the appropriate size in the end.

ck
Thanks ck but iam trying do the code inplace without using any additional buffer.
Jonathan
+1  A: 

The function looks fine to me. I've written inline comments. Hope it helps:

// function takes a char array as input.
// modifies it to remove duplicates and adds a 0 to mark the end
// of the unique chars in the array.
public static void removeDuplicates(char[] str) {
  if (str == null) return; // if the array does not exist..nothing to do return.
  int len = str.length; // get the array length.
  if (len < 2) return; // if its less than 2..can't have duplicates..return.
  int tail = 1; // number of unique char in the array.
  // start at 2nd char and go till the end of the array.
  for (int i = 1; i < len; ++i) { 
    int j;
    // for every char in outer loop check if that char is already seen.
    // char in [0,tail) are all unique.
    for (j = 0; j < tail; ++j) {
      if (str[i] == str[j]) break; // break if we find duplicate.
    }
    // if j reachs tail..we did not break, which implies this char at pos i
    // is not a duplicate. So we need to add it our "unique char list"
    // we add it to the end, that is at pos tail.
    if (j == tail) {
      str[tail] = str[i]; // add
      ++tail; // increment tail...[0,tail) is still "unique char list"
    }
  }
  str[tail] = 0; // add a 0 at the end to mark the end of the unique char.
}
codaddict
thanks cool its now simple
Jonathan
The code is not fine; the last line causes `ArrayIndexOutOfBoundsException` if there aren't any dupes.
polygenelubricants
@polygenelubricants: Good catch.
codaddict
+2  A: 

Your code is, I'm sorry to say, very C-like.

A Java String is not a char[]. You say you want to remove duplicates from a String, but you take a char[] instead.

Is this char[] \0-terminated? Doesn't look like it because you take the whole .length of the array. But then your algorithm tries to \0-terminate a portion of the array. What happens if the arrays contains no duplicates?

Well, as it is written, your code actually throws an ArrayIndexOutOfBoundsException on the last line! There is no room for the \0 because all slots are used up!

You can add a check not to add \0 in this exceptional case, but then how are you planning to use this code anyway? Are you planning to have a strlen-like function to find the first \0 in the array? And what happens if there isn't any? (due to all-unique exceptional case above?).

What happens if the original String/char[] contains a \0? (which is perfectly legal in Java, by the way, see JLS 10.9 An Array of Characters is Not a String)

The result will be a mess, and all because you want to do everything C-like, and in place without any additional buffer. Are you sure you really need to do this? Why not work with String, indexOf, lastIndexOf, replace, and all the higher-level API of String? Is it provably too slow, or do you only suspect that it is?

"Premature optimization is the root of all evils". I'm sorry but if you can't even understand what the original code does, then figuring out how it will fit in the bigger (and messier) system will be a nightmare.


My minimal suggestion is to do the following:

  • Make the function takes and returns a String, i.e. public static String removeDuplicates(String in)
  • Internally, works with char[] str = in.toCharArray();
  • Replace the last line by return new String(str, 0, tail);

This does use additional buffers, but at least the interface to the rest of the system is much cleaner.


Alternatively, you can use StringBuilder as such:

static String removeDuplicates(String s) {
    StringBuilder noDupes = new StringBuilder();
    for (int i = 0; i < s.length(); i++) {
        String si = s.substring(i, i + 1);
        if (noDupes.indexOf(si) == -1) {
            noDupes.append(si);
        }
    }
    return noDupes.toString();
}

Note that this is essentially the same algorithm as what you had, but much cleaner and without as many little corner cases, etc.

polygenelubricants