ansaurus

Question

Comparison Algorithm.

Answer 1

+6 A:

You could sort both arrays then do a linear scan through both arrays at the same time. This would be an O(nlogn) algorithm for the sort and O(n) for the scan / building of the new arrays.

Brian R. Bondy 2009-08-11 02:49:06

Answer 2

A:

Try using Sets. They usually have a difference() method (or something like this) that returns exactly what u want. Simple as that. Once it's language-agnostic, how you create sets or transform the difference into an array is done using general methods.

Set A = createSetA();
Set B = createSetB();

Array onlyAElements = transformToArray(A.difference(B));
Array onlyBElements = transformToArray(B.difference(A));

Alternatively, you could sort both arrays, and get both difference arrays at the same time. Something like

int aIndex = 0;
int bIndex = 0;

Array aOnly = new Array();
Array bOnly = new Array();

while (aIndex != a.length || bIndex != b.length)
{
   if (A[aIndex] == B[bIndex]
   {
       aIndex++;
       bIndex++;
   }
   else if (A[aIndex] > B[bIndex])
   {
       aOnly.add(A[aIndex]);
       aIndex++;
   }
   else 
   {
       bOnly.add(B[bIndex]);
       bIndex++;
   }
}

You should keep in mind there's some mistakes on getting out of bounds. But the code is just to get the main idea.

Samuel Carrijo 2009-08-11 02:49:38

Just what I was going to say. Here is Python's sets module for example, where you can use difference() or simply the "-" operator. http://docs.python.org/library/sets.html

MatrixFrog 2009-08-11 02:51:10

I think he's looking for the algorithm, which is hidden. There are plenty of easy one-liners for this (think LINQ), but they really don't teach us anything and we have no idea what their efficiency is without reading documentation.

JoshJordan 2009-08-11 02:51:11

The two algorithms for sets that I know of are hash sets and tree sets. Google or SO search for those terms.

MatrixFrog 2009-08-11 02:53:30

Agree about set, but the question did ask for an algorithm not a data structure. Perhaps you could expand your answer to involve sets.

Brian R. Bondy 2009-08-11 02:54:29

Well, the difference() method do all the stuff for you, usually in an efficient way. But I added some code anyway

Samuel Carrijo 2009-08-11 03:03:27

Answer 3

A:

I don't have an implementation or algorithm beyond what's already been said but I thought I would leave this solution in c#/linq for anyone who might find this question and wants to do this:

    var a = new int[] { 1, 2, 3, 6, 7, 8, 9, 10 };
    var b = new int[] { 1, 2, 3, 4, 5, 6, 7 };

    int[] addedToA = a.Except(b);
    int[] missingFromA = b.Except(a);

    foreach (var i in addedToA)
    {
        Console.Write("{0} ", i);
    }
    Console.WriteLine();
    foreach (var i in missingFromA)
    {
        Console.Write("{0} ", i);
    }

This prints:

8 9 10
4 5

Robert Cartaino 2009-08-11 03:11:56

Answer 4

+1 A:

A lot of this is going to depend on what type of data you have. You mention sorting, so I take it the elements are comparable. With sets of size m and n, this will take O(m lg m + n lg n) to sort, and that will dominate. (Asymptotically, it won't matter if you do binary search or walk both lists. Walking both lists should be O( m + n).) Of course, if you are using data with a better sort algorithm available, such as integers with radix-sort, you should be able to get down to O( m + n).

Using sets (as others are suggesting) implicitly suggests using hashing, which will definitely make your problem easier. If you hash all the elements in A ( O(m) ) and store all the hashes in a hash set in memory, then hash B ( O(n) ) and detect where collisions may occur in the hash set. This becomes a matter for optimization: you have to evaluate a classic speed-memory trade-off. The larger your hash set, the quicker the collision-checks will be. This will run in O( m + n ).

It's worth noting that you can prove that any algorithm that does what you ask will run in at least m + n time, since all the inputs need to be looked at.

David Berger 2009-08-11 03:13:40

@David: in comparing sorting versus hash table approaches you also need to consider 1) cost of hash function computation vs cost of comparison (optimized for the not-equals case) and 2) whether the hash function gives a good spread.

Stephen C 2009-08-11 03:40:49

@Stephen absolutely! I didn't want to get into those considerations because they tend to require assumptions about the inputs which we don't have.

David Berger 2009-08-13 00:43:03

Answer 5

+2 A:

I'd stuff array A's elements into a hash table, then iterate over array B doing lookups in the hash table to efficiently determine which elements in B are also in A. Then do the same with B's elements in a hash table while iterating over array A. That would be O(N) throughout.

Jeremy Friesner 2009-08-11 03:15:49

Hash tables tend to generate faster algorithms, but ones that usually occupies most memory. Good answer, by the way

Samuel Carrijo 2009-08-11 03:24:40

ansaurus

tags:

views:

answers:

Comparison Algorithm.

related questions