views:

1474

answers:

16

I'm working on a program that uses an ArrayList to store Strings. The program prompts the user with a menu and allows the user to choose an operation to perform. Such operations are adding Strings to the List, printing the entries etc. What I want to be able to do is create a method called removeDuplicates().This method will search the ArrayList and remove any duplicated values. I want to leave one instance of the duplicated value(s) within the list. I also want this method to return the total number of duplicates removed.

I've been trying to use nested loops to accomplish this but I've been running into trouble because when entries get deleted, the indexing of the ArrayList gets altered and things don't work as they should. I know conceptually what I need to do but I'm having trouble implementing this idea in code.

Here is some pseudo code:

start with first entry; check each subsequent entry in the list and see if it matches the first entry; remove each subsequent entry in the list that matches the first entry;

after all entries have been examined, move on to the second entry; check each entry in the list and see if it matches the second entry; remove each entry in the list that matches the second entry;

repeat for entry in the list

Here's the code I have so far:

public int removeDuplicates()
{
  int duplicates = 0;

  for ( int i = 0; i < strings.size(); i++ )
  {
     for ( int j = 0; j < strings.size(); j++ )
     {
        if ( i == j )
        {
          // i & j refer to same entry so do nothing
        }

        else if ( strings.get( j ).equals( strings.get( i ) ) )
        {
           strings.remove( j );
           duplicates++;
        }
     }
 }

   return duplicates;
}

UPDATE: It appears that Will is looking for a homework solution that involves developing the algorithm to remove duplicates, rather than a pragmatic solution using Sets. See his comment:

Thx for the suggestions. This is part of an assignment and I believe the teacher had intended for the solution to not include sets. In other words, I am to come up with a solution that will search for and remove duplicates without implementing a hashSet. The teacher suggested using nested loops which is what I'm trying to do but I've been having some problems with the indexing of the ArrayList after certain entries are removed.

+15  A: 

Why not use a collection such as Set (and an implementation like HashSet) which naturally prevents duplicates?

matt b
+1, using a Set is the best option. If you want to count the number of duplicates removed, store in an List as before, then construct a Set by passing a List into the constructor and then comparing the size difference between the two to get the number of duplicates.
Peter
+1 for the solution -1 for not a suitable solution for a `homework` = 0 pts. :( @Will didn't tag it as such tough
OscarRyz
what if preservation of order is important?
Carl
@Carl - use a LinkedHashSet then.
Ravi Wallau
To use set you will have to implement Equals inorder for the Set to work correctly on user created objects.
monksy
+5  A: 

Just to clarify my comment on matt b's answer, if you really want to count the number of duplicates removed, use this code:

List<String> list = new ArrayList<String>();

// list gets populated from user input...

Set<String> set = new HashSet<String>(list);
int numDuplicates = list.size() - set.size();
Peter
well, ive thought about a hashset but this is part of an assignment and the teacher didn't mention the hashset as a possible solution. I think we're supposed to come up with an implmentation without using hashset.
Will
Okay, so it's your understanding that this is an assignment to see if you can develop the proper algorithm for removing duplicates, rather than just "getting it done"? I will clarify your initial question/post.
Peter
right, at least thats what I got out of it anyway.
Will
A: 

Using a set is the best option to remove the duplicates:

If you have a list of of arrays you can remove the duplicates and still retain array list features:

 List<String> strings = new ArrayList<String>();
 //populate the array
 ...
 List<String> dedupped = new ArrayList<String>(new HashSet<String>(strings));
 int numdups = strings.size() - dedupped.size();

if you can't use a set, sort the array (Collections.sort()) and iterate over the list, checking if the current element is equal to the previous element, if it is, remove it.

Theo
A: 

moved to to original post

Will
A: 

Using a set is the best option (as others suggested).

If you want to compare all elements in a list with eachother you should slightly adapt your for loops:

for(int i = 0; i < max; i++)
    for(int j = i+1; j < max; j++)

This way you don't compare each element only once instead of twice. This is because the second loop start at the next element compared to the first loop.

Also when removing from a list when iterating over them (even when you use a for loop instead of an iterator), keep in mind that you reduce the size of the list. A common solution is to keep another list of items you want to delete, and then after you finished deciding which to delete, you delete them from the original list.

Thirler
+3  A: 

I've been trying to use nested loops to accomplish this but I've been running into trouble because when entries get deleted, the indexing of the ArrayList gets altered and things don't work as they should

Why don't you just decrease the counter each time you delete an entry.

When you delete an entry the elements will move too:

ej:

String [] a = {"a","a","b","c" }

positions:

a[0] = "a";
a[1] = "a";    
a[2] = "b";
a[3] = "c";

After you remove your first "a" the indexes are:

a[0] = "a";
a[1] = "b";
a[2] = "c";

So, you should take this into consideration and decrease the value of j ( j--) to avoid "jumping" over a value.

See this screenshot:

its working

OscarRyz
Give it a good try and let me know if you need to see that missing snippet. You're almost there!!
OscarRyz
Offtopic: that's a nice code font there. Which one is it?
BalusC
@BalusC: I have no idea. I'll try asking on SuperUser ( although I'm pretty sure it will be closed as "no computer related )
OscarRyz
@BalusC: It is Monaco http://superuser.com/questions/121123/whats-the-name-of-this-font
OscarRyz
I see. Mac only. There's however a Windows clone http://www.webdevkungfu.com/textmate-envy-aka-monaco-font-for-windows/ Thanks :)
BalusC
+1  A: 
public Collection removeDuplicates(Collection c) {
// Returns a new collection with duplicates removed from passed collection.
    Collection result = new ArrayList();

    for(Object o : c) {
        if (!result.contains(o)) {
            result.add(o);
        }
    }

    return result;
}

or

public void removeDuplicates(List l) {
// Removes duplicates in place from an existing list
    Object last = null;
    Collections.sort(l);

    Iterator i = l.iterator();
    while(i.hasNext()) {
        Object o = i.next();
        if (o.equals(last)) {
            i.remove();
        } else {
            last = o;
        }
    }
}

Both untested.

Will Hartung
A: 
public ArrayList removeDuplicates(ArrayList <String> inArray)
{
    ArrayList <String> outArray = new ArrayList();
    boolean doAdd = true;
    for (int i = 0; i < inArray.size(); i++)
    {
        String testString = inArray.get(i);
        for (int j = 0; j < inArray.size(); j++)
        {
            if (i == j)
            {
                break;
            }
            else if (inArray.get(j).equals(testString))
            {
                doAdd = false;
                break;
            }

        }
        if (doAdd)
        {
            outArray.add(testString);
        }
        else
        {
            doAdd = true;
        }

    }
    return outArray;

}
DougM
A: 

You could replace the duplicate with an empty string*, thus keeping the indexing in tact. Then after you've completed you can strip out the empty strings.

*But only if an empty string isn't valid in your implementation.

Smalltown2000
A: 
public <Foo> Entry<Integer,List<Foo>> uniqueElementList(List<Foo> listWithPossibleDuplicates) {
  List<Foo> result = new ArrayList<Foo>();//...might want to pre-size here, if you have reliable info about the number of dupes
  Set<Foo> found = new HashSet<Foo>(); //...again with the pre-sizing
  for (Foo f : listWithPossibleDuplicates) if (found.add(f)) result.add(f);
  return entryFactory(listWithPossibleDuplicates.size()-found.size(), result);
}

and then some entryFactory(Integer key, List<Foo> value) method. If you want to mutate the original list (possibly not a good idea, but whatever) instead:

public <Foo> int removeDuplicates(List<Foo> listWithPossibleDuplicates) {
  int original = listWithPossibleDuplicates.size();
  Iterator<Foo> iter = listWithPossibleDuplicates.iterator();
  Set<Foo> found = new HashSet<Foo>();
  while (iter.hasNext()) if (!found.add(iter.next())) iter.remove();
  return original - found.size();
}

for your particular case using strings, you may need to deal with some additional equality constraints (e.g., are upper and lower case versions the same or different?).

EDIT: ah, this is homework. Look up Iterator/Iterable in the Java Collections framework, as well as Set, and see if you don't come to the same conclusion I offered. The generics part is just gravy.

Carl
A: 

Sort. Get an iterator and iterate, saving a "lastEntry" as you go. Use remove when the current one matches lastEntry, otherwise update lastEntry. Done.

The key loop should look like this:

String lastEntry = null
Iterator<String> i = myArrayList.iterator()
while (i.hasNext()) {
  String k = i.next() ;
  if (k.equals(j)) i.remove(); else j=k;
}
Rex Kerr
A: 

Assuming you can't use a Set like you said, the easiest way of solving the problem is to use a temporary list, rather than attempting to remove the duplicates in place:

public class Duplicates {

    public static void main(String[] args) {
        List<String> list = new ArrayList<String>();
        list.add("one");
        list.add("one");
        list.add("two");
        list.add("three");
        list.add("three");
        list.add("three");

        System.out.println("Prior to removal: " +list);
        System.out.println("There were " + removeDuplicates(list) + " duplicates.");
        System.out.println("After removal: " + list);
    }

    public static int removeDuplicates(List<String> list) {
        int removed = 0;
        List<String> temp = new ArrayList<String>();

        for(String s : list) {
            if(!temp.contains(s)) {
                temp.add(s);
            } else {
                //if the string is already in the list, then ignore it and increment the removed counter
                removed++;
            }
        }

        //put the contents of temp back in the main list
        list.clear();
        list.addAll(temp);

        return removed;
    }

}
Jared Russell
+2  A: 

You can use nested loops without any problem:

public static int removeDuplicates(ArrayList<String> strings) {

    int size = strings.size();
    int duplicates = 0;

    // not using a method in the check also speeds up the execution
    // also i must be less that size-1 so that j doesn't
    // throw IndexOutOfBoundsException
    for (int i = 0; i < size - 1; i++) {
        // start from the next item after strings[i]
        // since the ones before are checked
        for (int j = i + 1; j < size; j++) {
            // no need for if ( i == j ) here
            if (!strings.get(j).equals(strings.get(i)))
                continue;
            duplicates++;
            strings.remove(j);
            // decrease j because the array got re-indexed
            j--;
            // decrease the size of the array
            size--;
        } // for j
    } // for i

    return duplicates;

}
Azder
Without testing it, this looks to be ideal. Note that the inner index starts one after the outer (you don't need to check from the beginning of the list every time because you've already checked up to the outer index value for duplicates).Most importantly, it seems to actually answer the question asked!
AndyT
A: 

You could try this one liner to take a copy of the String preserving order.

List<String> list;
List<String> dedupped = new ArrayList<String>(new LinkedHashSet<String>(list));

This approach is also O(n*ln(n)) instead of O(n^2)

Peter Lawrey
A: 

The problem you are seeing in your code is that you remove an entry during iteration, thus invalidating the iteration location.

For example:

{"a", "b", "c", "b", "b", "d"} 
       i         j  

Now you are removing strings[j].

{"a", "b", "c", "b", "d"} 
       i         j  

The inner loop ends and j is incremented.

{"a", "b", "c", "b", "d"} 
       i              j

Only one duplicate 'b' detected...oops.

best practice in these cases is to store the locations that have to be removed, and remove them after you have finished iterating through the arraylist. (One bonus, the strings.size() call can be optimized outside of the loops by you or the compiler)

Tip, you can start iterating with j at i+1, you've already checked the 0 - i!

NomeN
A: 

The inner for loop is invalid. If you delete an element, you cannot increment j, since j is now pointing at the element after the one you deleted, and you will need to inspect it.

In other words, you should use a while loop instead of a for loop, and only increment j if the elements at i and j do not match. If they do match, remove the element at j. size() will decrease by 1 and j will now be pointing at the following element, so there is no need to increase j.

Also, there is no reason to inspect all elements in the inner loop, just the ones following i, since duplicates before i have already been removed by prior iterations.

markusk