views:

227

answers:

9

Hello i want to remove duplicates from a list i do this but not working

List<Customer> listCustomer = new ArrayList<Customer>();    
for (Customer customer: tmpListCustomer)
{
  if (!listCustomer.contains(customer)) 
  {
    listCustomer.add(customer);
  }
 }
+5  A: 

I suspect you might not have Customer.equals() implemented properly (or at all).

List.contains() uses equals() to verify whether any of its elements is identical to the object passed as parameter. However, the default implementation of equals tests for physical identity, not value identity. So if you have not overwritten it in Customer, it will return false for two distinct Customer objects having identical state.

Here are the nitty-gritty details of how to implement equals (and hashCode, which is its pair - you must practically always implement both if you need to implement either of them). Since you haven't shown us the Customer class, it is difficult to give more concrete advice.

As others have noted, you are better off using a Set rather than doing the job by hand, but even for that, you still need to implement those methods.

Péter Török
how i can implemet this ?
Mercer
Override equals and hashCode methods from java.lang.Object. You'll want to read this: http://java.sun.com/developer/Books/effectivejava/Chapter3.pdf
duffymo
the correct way to remove duplicates from a list in Java is to use a Set. And you can't just override equals() without overriding hashCode() as well.
fuzzy lollipop
@Mercer, see my update.
Péter Török
@fuzzy lollipop: A Set is not magic, and can't detect that two Customers are equal when you haven't written the code to tell them. Using Set will have exactly the same results as the posted code, just faster.
DJClayworth
@fuzzy, of course, I was just adding the relevant explanation while you were writing your comment. Why are you riding this single technical detail, which - although of course important _in the long run_ - has nothing to do directly with solving the concrete question?
Péter Török
+3  A: 

If that code doesn't work, you probably have not implemented equals(Object) on the Customer class appropriately.

Presumably there is some key (let us call it customerId) that uniquely identifies a customer; e.g.

class Customer {
    private String customerId;
    ...

An appropriate definition of equals(Object) would look like this:

    public boolean equals(Object obj) {
        if (obj == this) {
            return true;
        }
        if (!(obj instanceof Customer)) {
            return false;
        }
        Customer other = (Customer) obj;
        return this.customerId.equals(other.customerId);
    }

For completeness, you should also implement hashCode so that two Customer objects that are equal will return the same hash value. A matching hashCode for the above definition of equals would be:

    public int hashCode() {
        return customerId.hashCode();
    }

It is also worth noting that this is not an efficient way to remove duplicates if the list is large. (For a list with N customers, you will need to perform N*(N-1)/2 comparisons in the worst case ... when there are no duplicates.) For a more efficient solution you should use something like a HashSet to do the duplicate checking.

Stephen C
you have to override hashCode() if you override equals() I can't believe all you guys with 10's of thousands of rep are forgetting something this basic.
fuzzy lollipop
@fuzzy - I did NOT forgot. I just saved the answer in stages. Look at the timestamps on the edits.
Stephen C
+2  A: 

Just add all your elements to a Set: it does not allow it's elements to be repeated. If you need a list afterwards, use new ArrayList(theSet) constructor afterwards (where theSet is your resulting set).

folone
Using a Set would produce exactly the same results as the code written above, just faster. The poster says "doesn't work" not "works too slowly".
DJClayworth
well Set works and his code doesn't so which is better working code that is correct and you don't have to write, or buggy code that you don't really understand and doesn't work.
fuzzy lollipop
I think you are assuming that he only wants to remove duplicate references to the same object. If that were the case, then the posted code would work.
DJClayworth
@fuzzy lollipop: Set does exactly what his code does. It's almost certainly the equals(Object) and hashCode() methods that are the problem here; the difference between object equality and value equality.
Dean J
A: 

Without seeing your object, I can suggest two things:

  • don't implement hashCode() and equals()
  • implement hashCode() and equals() using the same field(s), that will be your criteria for equality. Let your IDE (Eclipse, NetBeans) generate these methods.

Then, you can do the following, which will eliminate duplicates, because Sets don't allow duplicates

listCustomer = new ArrayList<Customer>(new HashSet<Customer>(listCustomer));
Bozho
For that to work, Mercer will need to implement hashcode anyway. If using treeset, Mercer will need to implement the comparable interface. AFAIK there is nothing wrong about overriding equals().
Marcelo Morales
you have to override hashCode() as well if you override equals()
fuzzy lollipop
yes, I updated my answer to reflect that.
Bozho
A: 

The correct answer for Java is use a Set. If you already have a List<Customer> and want to de duplicate it

Set<Customer> s = new HashSet<Customer>(listCustomer);

Otherise just use a Set implemenation HashSet, TreeSet directly and skip the List construction phase.

You will need to override hashCode() and equals() as well to make sure that the behavior you want actually what you get. equals() can be as simple as comparing unique ids of the objects to as complex as comparing every field. hashCode() can be as simple as returning the hashCode() of the unique id' String representation or the hashCode().

fuzzy lollipop
A Set would do the same as the posted code, just faster.
DJClayworth
speed isn't as important as maintainability, you don't have to maintain the code for Set and it is self documenting and the correct Java idiom.
fuzzy lollipop
Once again, using a Set does not address the issue. It just gives the wrong answer faster. Now that you've edited your answer to include the need to override equals it is at least technically correct, but still placing the emphasis in the wrong place.
DJClayworth
Homer:You can do it the right way, the wrong way, or the Max Power way. Bart:What's the Max Power way? Homer: It's the wrong way, only faster.
DJClayworth
A: 

As others have mentioned, you are probably not implementing equals() correctly.

However, you should also note that this code is considered quite inefficient, since the runtime could be the number of elements squared.

You might want to consider using a Set structure instead of a List instead, or building a Set first and then turning it into a list.

Uri
+1  A: 

Two suggestions:

  • Use a HashSet instead of an ArrayList. This will speed up the contains() checks considerably if you have a long list

  • Make sure Customer.equals() and Customer.hashCode() are implemented properly, i.e. they should be based on the combined values of the underlying fields in the customer object.

mikera
A: 

The "contains" method searched for whether the list contains an entry that returns true from Customer.equals(Object o). If you have not overridden equals(Object) in Customer or one of its parents then it will only search for an existing occurrence of the same object. It may be this was what you wanted, in which case your code should work. But if you were looking for not having two objects both representing the same customer, then you need to override equals(Object) to return true when that is the case.

It is also true that using one of the implementations of Set instead of List would give you duplicate removal automatically, and faster (for anything other than very small Lists). You will still need to provide code for equals.

You should also override hashCode() when you override equals().

DJClayworth
Would whoever downvoted this like to explain why?
DJClayworth
...and if that was you, fuzzy, please stop being petty.
DJClayworth
I did not downvote it, but I think your suggestion to override `equals` to delete duplicates might have earned it.
Alexander Pogrebnyak
You mean the suggestion that is the same as the accepted answer?
DJClayworth
@DJClayworth: After reading your post more carefully I do agree that it's totally correct ( On my first reading I though you suggested to do a `special case` equals ). You get my +1 for unfair downvoting. On the other hand, looking at other posts here, somebody had been on the downvoting vengeance spree.
Alexander Pogrebnyak
Thanks. I appreciate that.
DJClayworth
+1  A: 

Does Customer implement the equals() contract?

If it doesn't implement equals() and hashCode(), then listCustomer.contains(customer) will check to see if the exact same instance already exists in the list (By instance I mean the exact same object--memory address, etc). If what you are looking for is to test whether or not the same Customer( perhaps it's the same customer if they have the same customer name, or customer number) is in the list already, then you would need to override equals() to ensure that it checks whether or not the relevant fields(e.g. customer names) match.

Note: Don't forget to override hashCode() if you are going to override equals()! Otherwise, you might get trouble with your HashMaps and other data structures. For a good coverage of why this is and what pitfalls to avoid, consider having a look at Josh Bloch's Effective Java chapters on equals() and hashCode() (The link only contains iformation about why you must implement hashCode() when you implement equals(), but there is good coverage about how to override equals() too).

By the way, is there an ordering restriction on your set? If there isn't, a slightly easier way to solve this problem is use a Set<Customer> like so:

Set<Customer> noDups = new HashSet<Customer>();
noDups.addAll(tmpListCustomer);
return new ArrayList<Customer>(noDups);

Which will nicely remove duplicates for you, since Sets don't allow duplicates. However, this will lose any ordering that was applied to tmpListCustomer, since HashSet has no explicit ordering (You can get around that by using a TreeSet, but that's not exactly related to your question). This can simplify your code a little bit.

Scott Fines
+1 for remembering that Set can't be used if you need to maintain order.
DJClayworth