views:

118

answers:

5

How would I compare two arrays that might have different lengths and get the difference between each array?

For example:

Animal animals[] = { new Cat(), new Dog() };

Animal animals2[] = { new Cat(), new Dog(), new Alligator() };

How would I compare them two arrays and make it return the instance of Alligator?

A: 

Well, you could maybe use Set instead and use the removeAll() method.

Or you could use the following simple and slow algorithm for doing:

List<Animal> differences = new ArrayList<Animal>();

    for (Animal a1 : animals) {
       boolean isInSecondArray = false;
       for (Animal a2 : animals2) {
           if (a1 == a2)  {
                isInSecondArray = true;
                break;
           }
       } 

       if (!isInSecondArray)
           differences.add(a1)
    }

Then differences will have all the objects that are in animals array but not in animals2 array. In a similar way you can do the opposite (get all the objects that are in animals2 but not in animals).

Thanks but this algorithm needs to be fast as it's looped quite quickly. Also, about it being an array of Objects instead of an array of Animals - that was a mistake as it was just an example.
Gnarly
@Gnarly - You may want to test with this suggestion, first, and then get some timing information, as, it may be fast enough for your needs, since other parts of your program may be slowing down the solution. Before you optimize, which means more complexity, you should have an idea as to where the slowdown is, then you can come back and say you need a way that will go through two arrays up to size n (some number) in x ms (give some numbers for n and x).
James Black
It needs to be looped every 5ms.
Gnarly
This will not work as two instances of the same class will not compare to be the same. new Cat() == new Cat() will always be false.
Steve Kuo
@Steve Kuo - if you look at the first comment under functional, you will see that it was not to be of Objects, so, if you downvoted everyone because of that, you may want to reconsider.
James Black
I would like to point out that this is an `O(N**2)` algorithm. The time taken to calculate the difference between randomly generated arrays is proportional to the *square* of the list size. Comparing 1000 element lists takes 1 million times as long as comparing 1 element lists.
Stephen C
@James, you are correct. The question is incorrectly phrased.
Steve Kuo
A: 

I suggest that you put your objects in sets and then use an intersection of the sets:

// Considering you put your objects in setA and setB

Set<Object> intersection = new HashSet<Object>(setA);
intersection.retainAll(setB);

After that you can use removeAll to get a difference to any of the two sets:

setA.removeAll(intersection);
setB.removeAll(intersection);

Inspired by: http://hype-free.blogspot.com/2008/11/calculating-intersection-of-two-java.html

dark_charlie
The intersection will tell you what they have in common, so you would then need to remove those elements from the two lists, in order to know what is the difference, by what remains in both of them.
James Black
@James Black: Yes, that can be achieved for example by removeAll as suggested by functional. The intersection is general in the way that you can get the "overlapping" elements of any of the two sets.
dark_charlie
@dark_charlie - As I mentioned, in order to get what the OP asked you would need to do another step; you may want to update your answer.
James Black
+1  A: 

You may want to look at this article for more information:

http://download-llnw.oracle.com/javase/tutorial/collections/interfaces/set.html

As was mentioned, removeAll() is made for this, but you will want to do it twice, so that you can create a list of all that are missing in both, and then you could combine these two results to have a list of all the differences.

But, this is a destructive operation, so if you don't want to lose the information, copy the Set and operate on that one.

UPDATE:

It appears that my assumption of what is in the array is wrong, so removeAll() won't work, but with a 5ms requirement, depeending on the number of items to search it could be a problem.

So, it would appear a HashMap<String, Animal> would be the best option, as it is fast in searching.

Animal is an interface with at least one property, String name. For each class that implements Animal write code for Equals and hashCode. You can find some discussion here: http://www.ibm.com/developerworks/java/library/j-jtp05273.html. This way, if you want the hash value to be a combination of the type of animal and the name then that will be fine.

So, the basic algorithm is to keep everything in the hashmaps, and then to search for differences, just get an array of keys, and search through to see if that key is contained in the other list, and if it isn't put it into a List<Object>, storing the value there. You will want to do this twice, so, if you have at least a dual-core processor, you may get some benefit out of having both searches being done in separate threads, but then you will want to use one of the concurrent datatypes added in JDK5 so that you don't have to worry about synchronizations in the combined list of differences.

So, I would write it first as a single-thread and test, to get some ideas as to how much faster it is, also comparing it to the original implmemntation. Then, if you need it faster, try using threads, again, compare to see if there is a speed increase.

Before making any optimization ensure you have some metrics on what you already have, so that you can compare and see if the one change will lead to an increase in speed.

If you make too many changes at a time, one may have a large improvement on speed, but others may lead to a performance decrease, and it wouldn't be seen, which is why each change should be one at a time.

Don't lose the other implementations though, by using unit tests and testing perhaps 100 times each, you can get an idea as to what improvement each change gives you.

James Black
A: 

I don't care about perf for my usages (and you shouldn't either, unless you have a good reason to, and you find out via your profiler that this code is the bottleneck).

What I do is similar to functional's answer. I use LINQ set operators to get the exception on each list:

http://msdn.microsoft.com/en-us/library/bb397894.aspx

Edit:

Sorry, I didn't notice this is Java. Sorry, I'm off in C# la-la land, and they look very similar :)

Merlyn Morgan-Graham
+4  A: 

I would suggest that your question needs to be clarified. Currently, everyone is guessing what about what you are actually asking.

  • Are the arrays intended to represent sets, or lists, or something in between? In other words, does element order matter, and can there be duplicates?
  • What does "equal" mean? Does new Cat() "equal" new Cat()? Your example suggests that it does!!
  • What do you mean by the "difference"? Do you mean set difference?
  • What do you want to happen if the two arrays have the same length?
  • Is this a once-off comparison or does it occur repeatedly for the same arrays?
  • How many elements are there in the arrays (on average)?
  • Why are you using arrays at all?

Making the assumption that these arrays are intended to be true sets, then you probably should be using HashSet instead of arrays, and using collection operations like addAll and retainAll to calculate the set difference.

On the other hand, if the arrays are meant to represent lists, it is not at all clear what "difference" means.

If it is critical that the code runs fast, then you most certainly need to rethink your data structures. If you always start with arrays, you are not going to be able to calculate the "differences" fast ... at least in the general case.

Finally, if you are going to use anything that depends on the equals(Object) method (and that includes any of the Java collection types, you really need to have a clear understanding of what "equals" is supposed to mean in your application. Are all Cat instances equal? Are they all different? Are some Cat instances equal and others not? If you don't figure this out, and implement the equals and hashCode methods accordingly you will get confusing results.

Stephen C
It was only an example.
Gnarly
@Gnarly - examples should be accurate ... otherwise people will waste their time trying to answer questions that you didn't mean. See @James Black's comments, for example.
Stephen C