views:

3285

answers:

6

Hello all,

This is mainly a performance questions. I have a master list of all users existing in a String array AllUids. I also have a list of all end dated users existing in a String array EndUids.

I am working in Java and my goal is to remove any users that exist in the end dated array from the master list AllUids. I know PHP has a function called array_diff.

I was curious if Java has anything that will compare two arrays and remove elements that are similar in both. My objective is performance here which is why I asked about a built in function. I do not want to add any special packages.

I thought about writing a recursive function but it just seems like it will be inefficient. There are thousands of users in both lists. In order to exist in the end dated list, you must exist in the AllUids list, that is until removed.

Example:

String[] AllUids = {"Joe", "Tom", "Dan", "Bill", "Hector", "Ron"};

String[] EndUids = {"Dan", "Hector", "Ron"};

Functionality I am looking for:

String[] ActiveUids = AllUids.RemoveSimilar(EndUids);

ActiveUids would look like this:

{"Joe", "Tom", "Bill"}

Thank you all, Obviously I can come up with loops and such but I am not confident that it will be efficient. This is something that will run on production machines everyday.

+2  A: 

Commons Collections has a class called CollectionUtils and a static method called removeAll which takes an initial list and a list of thing to remove from that list:

Collection removeAll(Collection collection,
                     Collection remove)

That should do what you want provided you use lists of users rather than arrays. You can convert your array into a list very easily with Arrays.asList() so...

Collection ActiveUids = CollectionUtils.removeAll(Arrays.asList(AllUids), 
                                                  Arrays.asList(EndUids))

EDIT: I also did a bit of digging with this into Commons Collections and found the following solution with ListUtils in Commons Collections as well:

List diff = ListUtils.subtract(Arrays.asList(AllUids), Arrays.asList(EndUids));

Pretty neat...

Jon
CollectionUtils did not work for me but if I called ActiveUids.removeall(EndUids), it worked perfectly. I ended up changing the way I stored the Strings. I created a hashset using the following: HashSet <String>ActiveUids = new HashSet <String>(); Thank you all for all your help. This was exactly what I was looking for!
Cool, I think the ListUtils way is much cleaner though....
Jon
+1  A: 

You can't "remove" elements from arrays. You can set them to null, but arrays are of fixed size.

You could use java.util.Set and removeAll to take one set away from another, but I'd prefer to use the Google Collections Library:

Set<String> allUids = Sets.newHashSet("Joe", "Tom", "Dan",
                                      "Bill", "Hector", "Ron");
Set<String> endUids = Sets.newHashSet("Dan", "Hector", "Ron");
Set<String> activeUids = Sets.difference(allUids, endUids);

That has a more functional feel to it.

Jon Skeet
One thing that's a bit surprising about this approach is that `Sets.difference` returns a *view*. If you actually want `activeUids` to be a "normal" `Set` (ie: one where its value doesn't change if `allUids` and `endUids` happen to change, and where calling `size()` is O(1)), you should probably immediately pass the result of Sets.difference to something that constructs a Set. eg: `Sets.newHashSet(Sets.difference(a, b))`
Laurence Gonsalves
+1  A: 

You could put those strings into a Collection instead, and then use removeAll method.

Samuel Carrijo
+1  A: 

The easiest solution is probably to put all of the elements into a Set and then use removeAll. You can convert to a Set from an array like this:

Set<String> activeUids = new HashSet<String>(Arrays.asList(activeUidsArray));

though you should really try to avoid using arrays and favor collections.

Laurence Gonsalves
+1  A: 

Don't use arrays for this, use Collection and the removeAll() method. As for performance: unless you do something idiotic that leads to O(n^2) runtime, just forget about it. It's premature optimization, the useless/harmful kind. "thousands of users" is nothing, unless you're doing it thousands of times each second.

BTW, PHP "arrays" are in fact hash maps.

Michael Borgwardt
A: 

pythonnnnnnnnnnn

john