views:

2483

answers:

9

I'm having problems with Iterator.remove() called on a HashSet.

I've a Set of time stamped objects. Before adding a new item to the Set, I loop through the set, identify an old version of that data object and remove it (before adding the new object). the timestamp is included in hashCode and equals(), but not equalsData().

for (Iterator<DataResult> i = allResults.iterator(); i.hasNext();)
{
    DataResult oldData = i.next();
    if (data.equalsData(oldData))
    {   
        i.remove();
        break;
    }
}
allResults.add(data)

The odd thing is that i.remove() silently fails (no exception) for some of the items in the set. I've verified

  • The line i.remove() is actually called. I can call it from the debugger directly at the breakpoint in Eclipse and it still fails to change the state of Set

  • DataResult is an immutable object so it can't have changed after being added to the set originally.

  • The equals and hashCode() methods use @Override to ensure they are the correct methods. Unit tests verify these work.

  • This also fails if I just use a for statement and Set.remove instead. (e.g. loop through the items, find the item in the list, then call Set.remove(oldData) after the loop).

  • I've tested in JDK 5 and JDK 6.

I thought I must be missing something basic, but after spending some significant time on this my colleague and I are stumped. Any suggestions for things to check?

EDIT:

There have been questions - is DataResult truly immutable. Yes. There are no setters. And when the Date object is retrieved (which is a mutable object), it is done by creating a copy.

public Date getEntryTime()
{
    return DateUtil.copyDate(entryTime);
}

public static Date copyDate(Date date)
{
    return (date == null) ? null : new Date(date.getTime());
}

FURTHER EDIT (some time later): For the record -- DataResult was not immutable! It referenced an object which had a hashcode which changed when persisted to the database (bad practice, I know). It turned out that if a DataResult was created with a transient subobject, and the subobject was persisted, the DataResult hashcode was changed.

Very subtle -- I looked at this many times and didn't notice the lack of immutability.

+1  A: 

Have you tried something like

boolean removed = allResults.remove(oldData)
if (!removed) // COMPLAIN BITTERLY!

In other words, remove the object from the Set and break the loop. That won't cause the Iterator to complain. I don't think this is a long term solution but would probably give you some information about the hashCode, equals and equalsData methods

Ken Gentle
A: 

Thanks. Responses:

  • Iterator not being guaranteed to work. (a) for HashSet implementaiton it does and (b) it should throw an exception if not.

  • Did I try allResults.remove(oldData)? Yes I did- same result. (I tried it outside of the loop). It returns "false" indicating it couldnt remove.

  • Writing a compareTo method. It has one, but I think it's not used here (unless it's a TreeSet). HashSet uses the hashCode, I believe. I will go back and double check compareTo works properly just in case.

Will Glass
A: 

If there are two entries with the same data, only one of them is replaced... have you accounted for that? And just in case, have you tried another collection data structure that doesn't use a hashcode, say a List?

Zach Scrivena
A set dont allow dupes...
Frederic Morin
that's right, but dupes are defined by the equals() method, which in this case uses the data + timestamp.
Zach Scrivena
+1  A: 

Hi Will, Are you absolutely certain that DataResult is immutable? What is the type of the timestamp? If it's a java.util.Date are you making copies of it when you're initializing the DataResult? Keep in mind that java.util.Date is mutable.

For instance:

Date timestamp = new Date();
DataResult d = new DataResult(timestamp);
System.out.println(d.getTimestamp());
timestamp.setTime(System.currentTimeMillis());
System.out.println(d.getTimestamp());

Would print two different times.

It would also help if you could post some source code.

Jack Leow
@Jack Leow: That's a great comment. However, I am copying the date object before exposing via the getter. (Been reading edition 2 of Bloch's Effectiva Java recently). So DataResult is truly immutable.
Will Glass
copying the value or the reference ?
Frederic Morin
@Blade: creates a new instance of Date. See edit to original question.
Will Glass
+3  A: 

Under the covers, HashSet uses HashMap, which calls HashMap.removeEntryForKey(Object) when either HashSet.remove(Object) or Iterator.remove() is called. This method uses both hashCode() and equals() to validate that it is removing the proper object from the collection.

If both Iterator.remove() and HashSet.remove(Object) are not working, then something is definitely wrong with your equals() or hashCode() methods. Posting the code for these would be helpful in diagnosis of your issue.

Spencer K
A: 

I'm not up to speed on my Java, but I know that you can't remove an item from a collection when you are iterating over that collection in .NET, although .NET will throw an exception if it catches this. Could this be the problem?

David
While it is sometimes true that you cannot remove an element from a collection backing an iterator (a ConcurrentModificationException could be thrown), you can remove the element directly through the Iterator itself, if it supports the action, which it does for HashSet.
Spencer K
Thanks for educating me. :)
David
And the iterators which don't support removal at all throw UnsupportedOperationException on any remove() attempt.
Dov Wasserman
+1  A: 

Thanks for all the help. I suspect the problem must be with equals() and hashCode() as suggested by spencerk. I did check those in my debugger and with unit tests, but I've got to be missing something.

I ended up doing a workaround-- copying all the items except one to a new Set. For kicks, I used Apache Commons CollectionUtils.

    Set<DataResult> tempResults = new HashSet<DataResult>();
    CollectionUtils.select(allResults, 
            new Predicate()
            {
                public boolean evaluate(Object oldData)
                {
                    return !data.equalsData((DataResult) oldData);
                }
            }
            , tempResults);
    allResults = tempResults;

I'm going to stop here-- too much work to simplify down to a simple test case. But the help is miuch appreciated.

Will Glass
+2  A: 

I was very curious about this one still, and wrote the following test:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
import java.util.Set;

public class HashCodeTest {
    private int hashCode = 0;

    @Override public int hashCode() {
     return hashCode ++;
    }

    public static void main(String[] args) {
     Set<HashCodeTest> set = new HashSet<HashCodeTest>();

     set.add(new HashCodeTest());
     System.out.println(set.size());
     for (Iterator<HashCodeTest> iter = set.iterator();
       iter.hasNext();) {
      iter.next();
      iter.remove();
     }
     System.out.println(set.size());
    }
}

which results in:

1
1

If the hashCode() value of an object has changed since it was added to the HashSet, it seems to render the object unremovable.

I'm not sure if that's the problem you're running into, but it's something to look into if you decide to re-visit this.

Jack Leow
Thanks-- that's much appreciated. I suspect you might be right. Will go revisit my unit tests for equals/hashCode on the base object.
Will Glass
A: 

It's almost certainly the case the hashcodes don't match for the old and new data that are "equals()". I've run into this kind of thing before and you essentially end up spewing hashcodes for every object and the string representation and trying to figure out why the mismatch is happening.

If you're comparing items pre/post database, sometimes it loses the nanoseconds (depending on your DB column type) which can cause hashcodes to change.

Chris Kessel