views:

120

answers:

4

I have several ArrayLists of Integer objects, stored in a HashMap.

I want to get a list (ArrayList) of all the numbers (Integer objects) that appear in each list.

My thinking so far is:

  1. Iterate through each ArrayList and put all the values into a HashSet
    • This will give us a "listing" of all the values in the lists, but only once
  2. Iterate through the HashSet
    2.1 With each iteration perform ArrayList.contains()
    2.2 If none of the ArrayLists return false for the operation add the number to a "master list" which contains all the final values.

If you can come up with something faster or more efficient, funny thing is as I wrote this I came up with a reasonably good solution. But I'll still post it just in case it is useful for someone else.

But of course if you have a better way please do let me know.

A: 
  1. Create a Set (e.g. HashSet) from the first List.
  2. For each remaining list:
    • call set.retainAll (list) if both list and set are small enough
    • otherwise call set.retainAll (new HashSet <Integer> (list))

I cannot say after which thresholds second variant of step 2. becomes faster, but I guess maybe > 20 in size or so. If your lists are all small, you can not bother with this check.

As I remember Apache Collections have more efficient integer-only structures if you care not only about O(*) part, but also about the factor.

doublep
This is a horrible mutation of Ankur's first solution, creating a new HashSet for every list in the map will basically cause you to waste some O(n^2) space on nothing. This is java, the GC is nondeterministic. The GC could collect the unused hashsets after an unknown amount of time, which means an O(n^2) amount of memory will be sitting there, allocated, but not put to use. Or in other words, wasted.
Rubys
@Rubys: I don't see where you get O(n^2). In case I wasn't clear `set` is the one created on the first step. I.e. it is the same throughout the loop. Creating "intermediate" sets on step 2a is meant to speedup lookup (in `retainAll`), because in hash set it is (expected) O(1) vs. O(n) in lists.
doublep
For all we know, the list and the set are never small enough, and in every iteration you will create a new HashSet. The hashet itself will take O(n) space in memory. It's not O(n^2), that was my bad, it's O(nm) space, where n is the biggest list and m is the number of lists in the original set. You see, in each iteration you create a new hash set, which costs O(n) space. Since you have to put those pointers -somewhere-. Therefore, in all the m iterations together, you will use up O(nm) SPACE. Time will be wonderful still.
Rubys
+2  A: 

You have to change step 1: - Use the shortest list instead of your hashSet (if it isn't in the shortest list it isn't in all lists...)

Then call contains on the other lists and remove value as soon as one return false (and skip further tests for this value)

At the end the shortest list will contain the answer...

some code:

public class TestLists {

    private static List<List<Integer>> listOfLists = new ArrayList<List<Integer>>();

    private static List<Integer> filter(List<List<Integer>> listOfLists) {

        // find the shortest list
        List<Integer> shortestList = null;
        for (List<Integer> list : listOfLists) {
            if (shortestList == null || list.size() < shortestList.size()) {
                shortestList = list;
            }
        }

        // create result list from the shortest list
        final List<Integer> result = new LinkedList<Integer>(shortestList);

        // remove elements not present in all list from the result list
        for (Integer valueToTest : shortestList) {
            for (List<Integer> list : listOfLists) {
                // no need to compare to itself
                if (shortestList == list) {
                    continue;
                }

                // if one list doesn't contain value, remove from result and break loop
                if (!list.contains(valueToTest)) {
                    result.remove(valueToTest);
                    break;
                }
            }
        }

        return result;
    }


    public static void main(String[] args) {
        List<Integer> l1 = new ArrayList<Integer>(){{
            add(100);
            add(200);
        }};
        List<Integer> l2 = new ArrayList<Integer>(){{
            add(100);
            add(200);
            add(300);
        }};
        List<Integer> l3 = new ArrayList<Integer>(){{
            add(100);
            add(200);
            add(300);
        }};
        List<Integer> l4 = new ArrayList<Integer>(){{
            add(100);
            add(200);
            add(300);
        }};
        List<Integer> l5 = new ArrayList<Integer>(){{
            add(100);
            add(200);
            add(300);
        }};
        listOfLists.add(l1);
        listOfLists.add(l2);
        listOfLists.add(l3);
        listOfLists.add(l4);
        listOfLists.add(l5);
        System.out.println(filter(listOfLists));

    }

}
pgras
+4  A: 

I am not sure that I understand your goal. But if you wish to find the intersection of a collection of List<Integer> objects, then you can do the following:

public static List<Integer> intersection(Collection<List<Integer>> lists){
    if (lists.size()==0)
        return Collections.emptyList();

    Iterator<List<Integer>> it = lists.iterator();
    HashSet<Integer> resSet = new HashSet<Integer>(it.next());
    while (it.hasNext())
        resSet.retainAll(new HashSet<Integer>(it.next()));

    return new ArrayList<Integer>(resSet);
}

This code runs in linear time in the total number of items. Actually this is average linear time, because of the use of HashSet.

Also, note that if you use ArrayList.contains() in a loop, it may result in quadratic complexity, since this method runs in linear time, unlike HashSet.contains() that runs in constant time.

Eyal Schneider
probably worthwhile to throw an empty check on resSet in your while loop as well.
Carl
oh, and you don't need to construct a new hash set for each it.next() - retainAll works on collections, and duplicate elements in it.next() won't affect the operation.
Carl
edit: I suppose there's some savings for some cases of retainAll, but in this particularly case a custom method might be in order anyways.
Carl
@Carl: If I use retainAll on the list itself, it will increase the time complexity. X.retainAll(Y) works in O(|X|*|Y|) time when Y is a simple List implementation. When Y is HashSet, it works in O(|X|) average time, so the copying is worthwhile.
Eyal Schneider
A: 

Using the Google Collections Multiset makes this (representation-wise) a cakewalk (though I also like Eyal's answer). It's probably not as efficient time/memory-wise as some of the other's here, but it's very clear what's going on.

Assuming the lists contain no duplicates within themselves:

Multiset<Integer> counter = HashMultiset.create();
int totalLists = 0;
// for each of your ArrayLists
{
 counter.addAll(list);
 totalLists++;
}

List<Integer> inAll = Lists.newArrayList();

for (Integer candidate : counter.elementSet())
  if (counter.count(candidate) == totalLists) inAll.add(candidate);`

if the lists might contain duplicate elements, they can be passed through a set first:

counter.addAll(list) => counter.addAll(Sets.newHashSet(list))

Finally, this is also ideal if you want might want some additional data later (like, how close some particular value was to making the cut).

Another approach that slightly modifies Eyal's (basically folding together the act of filtering a list through a set and then retaining all the overlapping elements), and is more lightweight than the above:

public List<Integer> intersection(Iterable<List<Integer>> lists) {

 Iterator<List<Integer>> listsIter = lists.iterator();
 if (!listsIter.hasNext()) return Collections.emptyList();
 Set<Integer> bag = new HashSet<Integer>(listsIter.next());
 while (listsIter.hasNext() && !bag.isEmpty()) { 
  Iterator<Integer> itemIter = listsIter.next().iterator();
  Set<Integer> holder = new HashSet<Integer>(); //perhaps also pre-size it to the bag size
  Integer held;
  while (itemIter.hasNext() && !bag.isEmpty())
   if ( bag.remove(held = itemIter.next()) )
    holder.add(held);
  bag = holder;
 }
 return new ArrayList<Integer>(bag);
}
Carl