views:

922

answers:

9

I want to be able to remove multiple elements from a set while I am iterating over it. Initially I hoped that iterators were smart enough for the naive solution below to work.

Set<SomeClass> set = new HashSet<SomeClass>();
fillSet(set);
Iterator<SomeClass> it = set.iterator();
while (it.hasNext()) {
    set.removeAll(setOfElementsToRemove(it.next()));
}

But this throws a ConcurrentModificationException.

Note that iterator.remove() will not work as far as I can see because I need to remove multiple things at a time. Also assume that it is not possible to identify which elements to remove "on the fly", but it is possible to write the method setOfElementsToRemove(). In my specific case it would take up a lot of memory and processing time to determine what to remove while iterating. Making copies is also not possible because of memory constraints.

setOfElementsToRemove() will generate some set of SomeClass instances that I want to remove, and fillSet(set) will fill the set with entries.

After searching Stack Overflow I could not find a good solution to this problem but a few hours break later I realized the following would do the job.

Set<SomeClass> set = new HashSet<SomeClass>();
Set<SomeClass> outputSet = new HashSet<SomeClass>();
fillSet(set);
while (!set.isEmpty()) {
    Iterator<SomeClass> it = set.iterator();
    SomeClass instance = it.next();
    outputSet.add(instance);
    set.removeAll(setOfElementsToRemoveIncludingThePassedValue(instance));
}

setOfElementsToRemoveIncludingThePassedValue() will generate a set of elements to remove that includes the value passed to it. We need to remove the passed value so set will empty.

My question is whether anyone has a better way of doing this or whether there are collection operations that support these kind of removals.

Also, I thought I would post my solution because there seems to be a need and I wanted to contribute the the excellent resource that is Stack Overflow.

+2  A: 

There's a simple answer to this - use the Iterator.remove() method.

Andrzej Doyle
That will not work in this case. It only removes the current element returned by the iterator, I need to remove multiple elements at a time.
Nash0
Then just call remove on each element you want to remove.
Ben S
Unless you want to remove a bunch of elements based on a condition (i.e. remove both elements when a duplicate element is found) this is the way to go. Otherwise, go with what Peter added.
Malaxeur
Because I need to be able to remove arbitrary elements from the set at any point during the iteration over the set. In my specific case there is no easy way of knowing 'on the fly' whether the current element should be removed. Sorry I should make that clear, and will.
Nash0
+1  A: 

Why don't you use the iterator's remove method on the objects you want to remove?

Iterators were introduced mainly because enumerators couldn't handle deleting while enumerating.

Ben S
Why the down vote?
Ben S
A: 

You should call Iterator.remove method.

Also note, that on most java.util collections the remove method will generate exception if the contents of the collection have changed. So, if the code is multi-threaded use extra caution, or use concurrent collections.

Alexander Pogrebnyak
+6  A: 

Normally when you remove an element from a collection while looping over the collection, you'll get a Concurrent Modification Exception. This is partially why the Iterator interface has a remove() method. Using an iterator is the only safe way to modify a collection of elements while traversing them.

The code would go something like this:

Set<SomeClass> set = new HashSet<SomeClass>();
fillSet(set);
Iterator<SomeClass> setIterator = set.iterator();
while (setIterator.hasNext()) {
    SomeClass currentElement = setIterator.next();
    if (setOfElementsToRemove(currentElement).size() > 0) {
        setIterator.remove();
    }
}

This way you'll safely remove all elements that generate a removal set from your setOfElementsToRemove().

EDIT

Based on a comment to another answer, this may be more what you want:

Set<SomeClass> set = new HashSet<SomeClass>();
Set<SomeClass> removalSet = new HashSet<SomeClass>();
fillSet(set);

for (SomeClass currentElement : set) {
    removalSet.addAll(setOfElementsToRemove(currentElement);
}

set.removeAll(removalSet);
Peter Nix
Yep your second answer will work, other than possibly running into memory problems +1 though
Nash0
Looks good, but I'd replace the last loop in your second example with set.removeAll(removalSet).
rob
@rob yeah, its obvious when it gets pointed out. I'll proof read my code better next time.
Peter Nix
+5  A: 

Any solution that involves removing from the set you're iterating while you're iterating it, but not via the iterator, will absolutely not work. Except possibly one: you could use a Collections.newSetFromMap(new ConcurrentHashMap<SomeClass, Boolean>(sizing params)). The catch is that now your iterator is only weakly consistent, meaning that each time you remove an element that you haven't encountered yet, it's undefined whether that element will show up later in your iteration or not. If that's not a problem, this might work for you.

Another thing you can do is build up a toRemove set as you go instead, then set.removeAll(itemsToRemove); only at the end. Or, copy the set before you start, so you can iterate one copy while removing from the other.

EDIT: oops, I see Peter Nix had already suggested the toRemove idea (although with an unnecessarily hand-rolled removeAll).

Kevin Bourrillion
+4  A: 

Instead of iterating through all the elements in the Set to remove the ones you want, you can actually use Google Collections (not something you can't do it on your own though) and apply a Predicate to mask the ones you don't need.

package com.stackoverflow.q1675037;

import java.util.HashSet;
import java.util.Set;

import org.junit.Assert;
import org.junit.Test;

import com.google.common.base.Predicate;
import com.google.common.collect.Iterables;
import com.google.common.collect.Sets;


public class SetTest
{
public void testFilter(final Set<String> original, final Set<String> toRemove, final Set<String> expected)
{

    Iterable<String> mask = Iterables.filter(original, new Predicate<String>()
    {
        @Override
        public boolean apply(String next) {
        return !toRemove.contains(next);
        }
    });

    HashSet<String> filtered = Sets.newHashSet(mask);

    Assert.assertEquals(original.size() - toRemove.size(), filtered.size());
    Assert.assertEquals(expected, filtered);        
}


@Test
public void testFilterNone()
{
    Set<String> original = new HashSet<String>(){
        {
            this.add("foo");
            this.add("bar");
            this.add("foobar");
        }
    };

    Set<String> toRemove = new HashSet();

    Set<String> expected = new HashSet<String>(){
        {
            this.add("foo");                
            this.add("bar");
            this.add("foobar");
        }
    };

    this.testFilter(original, toRemove, expected);
}

@Test
public void testFilterAll()
{
    Set<String> original = new HashSet<String>(){
        {
            this.add("foo");
            this.add("bar");
            this.add("foobar");
        }
    };

    Set<String> toRemove = new HashSet<String>(){
        {
            this.add("foo");
            this.add("bar");
            this.add("foobar");
        }
    };

    HashSet<String> expected = new HashSet<String>();
    this.testFilter(original, toRemove, expected);
}    

@Test
public void testFilterOne()
{
    Set<String> original = new HashSet<String>(){
        {
            this.add("foo");
            this.add("bar");
            this.add("foobar");
        }
    };

    Set<String> toRemove = new HashSet<String>(){
        {
            this.add("foo");
        }
    };

    Set<String> expected = new HashSet<String>(){
        {
            this.add("bar");
            this.add("foobar");
        }
    };

    this.testFilter(original, toRemove, expected);
}    


@Test
public void testFilterSome()
{
    Set<String> original = new HashSet<String>(){
        {
            this.add("foo");
            this.add("bar");
            this.add("foobar");
        }
    };

   Set<String> toRemove = new HashSet<String>(){
        {
            this.add("bar");
            this.add("foobar");
        }
    };

    Set<String> expected = new HashSet<String>(){
        {
            this.add("foo");
        }
    };

    this.testFilter(original, toRemove, expected);
}    
}
Cue
A+ for effort and quality :) +1
Nash0
Could be simplified by using `Sets.difference()`
finnw
+1  A: 

If you have enough memory for one copy of the set, I'll assume you also have enough memory for two copies. The Kafka-esque rules you cite don't seem to forbid that :)

My suggestion, then:

fillSet(set);
fillSet(copy);
for (Object item : copy) {
   if (set.contains(item)) { // ignore if not
     set.removeAll(setOfStuffToRemove())
   }
}

so copy stays intact and just provides the stuff to loop on, while set suffers deletions. Stuff that was removed from set in the meantime will be ignored.

Carl Smotricz
+2  A: 

You could try the java.util.concurrent.CopyOnWriteArraySet which gives you an iterator that is a snapshot of the set at the time of the iterator creation. Any changes you make to the set (i.e. by calling removeAll) won't be visible in the iterator, but are visible if you look at the set itself (and removeAll won't throw.

joe p
A: 

It is possible to implement a Set that allows its elements to be removed whilst iterating over it.

I think the standard implementations (HashSet, TreeSet etc.) disallow it because that means they can use more efficient algorithms, but it's not hard to do.

Here's an incomplete example using Google Collections:

import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;

import com.google.common.base.Predicates;
import com.google.common.collect.ForwardingSet;
import com.google.common.collect.Iterators;
import com.google.common.collect.Sets;

public class ConcurrentlyModifiableSet<E>
extends ForwardingSet<E> {
 /** Create a new, empty set */
 public ConcurrentlyModifiableSet() {
  Map<E, Boolean> map = new ConcurrentHashMap<E, Boolean>();
  delegate = Sets.newSetFromMap(map);
 }

 @Override
 public Iterator<E> iterator() {
  return Iterators.filter(delegate.iterator(), Predicates.in(delegate));
 }

 @Override
 protected Set<E> delegate() {
  return this.delegate;
 }

 private Set<E> delegate;
}

Note: The iterator does not support the remove() operation (but the example in the question does not require it.)

finnw