ansaurus

Question

Distinguishing extra element from two arrays?

Answer 1

+1 A:

Depending on the language you are using, it might have a built in way to do the array differences. PHP does http://ca3.php.net/array_diff

Mark Story 2009-10-19 18:51:51

Thats a inbuilt function you are talking about.I expect an approach (logic)

Ravi 2009-10-19 19:02:54

Using 'inbuilt functions' has to count, doesn't it? It seems to me goal is to make a solution in whatever language, otherwise there's not much of a way to avoid the functions given by your language unless you're writing machine code.

Alex JL 2009-10-26 06:26:41

Code Duck -- in some interviews prospective employers test for raw CS problem solving ability by having you do it as if you had no convenience functions or language constructs to make it easier. It's a test of pure algorithm problem more than "can you solve it" problem. FWIW.

artlung 2009-10-31 06:28:38

Answer 2

A:

It is impossible to solve this problem without making comparisons. Some modern languages do have set difference and other aggregate operators, but they work by internally making comparisons. If the total array size is odd (not the case here) then it does work to xor the elements together. I suppose the question of whether the ALU's xor op should be considered a comparison is debatable.

And without specifying a language, you can't refer to a library, so the only possible solution is a comparison-based pseudo-code exposition.

So here you go:

a <- first element
a_count = 1
b_count = 0
loop over remaining elements
   if element != a
     b <- element
     ++b_count
   else
     ++a_count
   if found_result
      break loop
end loop

found_result
   if a_count > 1 and b_count > 0
     the unique one is b
     return true
   if b_count > 1 and a_count > 0 # a_acount actually always > 0
     the unique one is a
     return true
   return false

DigitalRoss 2009-10-19 18:57:56

Answer 3

A:

Here's some simple pseudocode for a solution. I'm assuming it's OK to modify the arrays and that arrays have a remove(value) method (or that you could write one trivially). It takes the 2 arrays and returns an array containg 2 values, the first is the unique value from the first array and the second is the unique value from the 2nd array.

function findUnique(a:Array, b:Array):Array {
  var uniqueFromA:int;
  var uniqueFromB:int;

  for each(value:int in a) {
    var len:int = b.length;
    b.remove(value);
    /* b's length didn't change, so nothing was removed, so the value doesn't
     * exist in it. */
    if(b.length == len) {
      uniqueFromA = value;
    }
  }

  /* Only the unique value in b still exists in b */
  uniqueFromB = b[0];

  return [uniqueFromA, uniqueFromB];
}

Herms 2009-10-19 19:05:28

Answer 4

+1 A:

Here's another possibility. Unlike my previous answer it doesn't modify the arrays passed in, and should have a lower big-O bound (O(n) instead of O(n^2) - assuming constant time hashtable lookups), but will take up significantly more memory.

function findUnique(a:Array, b:Array):Array {
  var aHash:Hashtable = buildHash(a);
  var bHash:Hashtable = buildHash(b);

  var uniqueFromA:int;
  var uniqueFromB:int;

  for each(value:int in a) {
    if(!bHash.contains(value)) {
      uniqueFromA = value;
      break;
    } else {
      /* Not necessary, but will speed up the 2nd for-loop by removing
       * values we know are duplicates. */
      bHash.remove(value);
    }
  }

  for each(value:int in b) {
    if(!aHash.contains(value)) {
      uniqueFromB = value;
      break;
    }
  }

  return [uniqueFromA, uniqueFromB];
}

function buildHash(a:Array):Hashtable {
  var h:Hashtable = new Hashtable();
  for each(value:int in a) {
    h[value] = true;
  }

  return h;
}

Herms 2009-10-19 19:13:31

Answer 5

+2 A:

Technically, you can do it in constant time, since the arrays (and values within them) are limited. For the generalized problem, we have to make out something trickier.

Here's a linear-time solution.

First, we should build a hash based on one array. Value lookup in a hash table takes O(1 + k/n) comparisons [1], where k is the length of hash table. Thus iteration over the first array (that contains n elements) and lookups for each value takes O(n+k).

Then we iterate the other one, looking up for each element in the hash. When an element is not found -- that's unique one from the other array. (O(n+k) again). Then we iterate hash to look for the second unique element (O(k)).

Total time is O(n+k). Since it doesn't make sense to let k be more than n, it's the linear solution.

Perl code for that:

sub unique
{
  my ($arr, $brr) = @_;
  my %hash = map{$_ => 1} @$arr;
  %hash{$_}-- for @$brr;
  return grep {$_} keys %hash;
}

Pavel Shved 2009-10-19 19:14:24

Answer 6

+1 A:

You have given two integer arrays each of size 10.
Both contains 9 equal elements (say 1 to 9)
Only one element is different.

Depending on the constraints, you can solve this very quickly in linear time. If you hold an int[10], then you can assume that an element at index 1 corresponds to the number 1; the element itself holds a count of both arrays. The following pseudocode will solve the problem quickly:

let buckets = new int[10] // init all buckets to zero
for num in arr1 do buckets[num]++ // add numbers from the first array
for num in arr2 do buckets[num]++ // add from the second array
for i in 1 to 9 do                // find odd man out
    if buckets[i] <= 1 then return i

This is essentially a bounded hashtable. This only works if our given list of elements is constrained between 1 and 9.

Technically, you don't even need to keep a running count of each element. You could, in principle, simply loop through arr1, then iterate through arr2 until you run across an element which was not hashed from the first array.

Juliet 2009-10-19 19:15:23

you're only returning a single value. There are 2 unique values, one in each array. And I think the values could be anything, not just 1 through 9.

Herms 2009-10-19 19:18:13

@Herms: looks like we have mutually contradictory interpretations of the OP. I love it when I can say "don't blame the coder, blame the requirements" ;)

Juliet 2009-10-19 19:24:21

Answer 7

+12 A:

If you need this to scale, then I would use one of the many Set implementations in the world. For example, Java's HashSet.

Throw all of the first array in the Set. Then, for each member of the second array, if it is contained in the Set, remove it; otherwise mark it as Unique #2. After this procedure, the last remaining member of the Set is Unique #1.

I'd probably do it this way, even on an interview, and even for simple ten-element arrays. Life is too short to spend trying to find the clever way to scale a wall when there's a perfectly good door in it.

jprete 2009-10-19 19:18:58

+1 and I'll give another +1 to whoever can do without the undesirable comparisons @Ravi *required*. I'm curious if it's even possible to do this without a compare...

Austin Salonen 2009-10-19 19:54:18

Austin, isn't that what one of my answers implemented (though in pseudocode, not java-specific): http://stackoverflow.com/questions/1590405/distinguishing-extra-element-from-two-arrays/1590528#1590528

Herms 2009-10-19 20:07:10

+1 for removing items.

Konrad Rudolph 2009-11-02 10:51:02

One doesn't need to remove anything. Put the first array in a set, then start adding the second array. The first value that increases the sets length is your unique. It's effective but it assumes the presence of a set structure. It's the same to assume unique is the first element in the second array and simply return it.

DaClown 2010-06-29 12:50:26

Answer 8

+5 A:

Here is a mathematical approach, inspired by Kevin's answer and its comments.

Let's call the arrays A and B and let their unique elements be a and b, respectively. First, take the sums of both arrays and subtract one from the other; since everything else cancels, sum(A) - sum(B) = a - b = s. Then, multiply the elements of both arrays and divide one by the other. Again, things cancel, so mult(A) / mult(B) = a / b = r. Now, from these, we get a = rb, so rb - b = s or b = s / (r - 1) and then a = rs / (r - 1).

I'm calling this mathematical because multiplying things out might not be a reasonable thing to do in a real program. The key is to have two different operations that both individually allow the canceling behavior and so that one distributes over the other. This latter property is used when going from rb - b = s to b = s / (r - 1), and that won't work, say, with addition and XOR, which was my first attempt.

jk 2009-10-19 19:25:43

Answer 9

A:

Put each element of the first array into a hash. For each item in the second array, if it's already in the hash, remove it, else add it. At the end you have a hash with two keys, which are your unique elements.

Simple verison in Ruby

def unique(a,b)
  h={}
  a.each do |ax|
    h[ax]=true
  end
  b.each do |bx|
    if h[bx]
      h.delete(bx)
    else
      h[bx]=true
    end
  end
  h.keys
end

With a little more Ruby personality, but still in the universe where we can't simply do (a | b) - (a & b) or a.to_set ^ b.to_set:

def unique(a,b)
  h = {}
  a.each do |ax|
    h[ax] = true
  end
  b.each do |bx|
    h.delete(bx) or h[bx] = true
  end
  h.keys
end

glenn mcdonald 2009-10-27 03:33:32

Answer 10

+4 A:

This can be solved quickly from just the sum and the sum of the squares of the two sequences. And calculating these sums will certainly be faster than the hashes that are suggested, and doesn't involve any comparisons between the sequence items.

Here's how to do it: If the two sets are {a_i} and {b_i}, then call A and B their sums, and A2 and B2 are the sum of the squares, i.e. A2 = Sum({a_i²}), and for convenience, D=A-B, and D2=A2-B2. Therefore, D=a-b and D2=a²-b², where a and b are the two elements that are different, and from this we see

a = (D²+D2)/(2*D)
b = a - D

This works out because, from algebra, a²-b²=(a+b)(a-b) or D2=(a+b)D, so a+b=D2/D, and since we also know a-b, we can find a and b.

An example in Python may be more convincing

a, b = 5, 22   # the initial unmatched terms
x, y = range(15), range(15)
y[a] = b
print "x =", x  # x = [0, 1, 2, 3, 4,  5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
print "y =", y  # y = [0, 1, 2, 3, 4, 22, 6, 7, 8, 9, 10, 11, 12, 13, 14]

D = sum(x) - sum(y)
D2 = sum([i**2 for i in x]) - sum([i**2 for i in y])  #element-wise squaring
a = (D2+D*D)/(2*D)
b = a - D

print "a=%i, b=%i" % (a, b)
#prints a=5, b=22  which is correct

(Of course, this is somewhat similar to jk's answer, except it doesn't require the multiplication of all the terms and the huge numbers that would result, but thanks to jk for the idea of a mathematical approach.)

tom10 2009-11-01 07:27:27

+1 it works regardless of whether the arrays are sorted or not and it needs linear computation time

DaClown 2010-06-29 12:43:13

Answer 11

+2 A:

In LINQ:

var unique1 = (from a in arrayA where !arrayB.Contains(a) select a).First();
var unique2 = (from b in arrayB where !arrayA.Contains(b) select b).First();
return new Pair(unique1, unique2);

...

public sealed class Pair<T0, T1>
{
    public T0 Item1 {get;set;}
    public T1 Item2 {get;set;}
    public Pair(T0 item1, T1 item2)
    {
        Item1 = item1;
        Item2 = item2;
    }
    //plus GetHashCode, equality etc.
}

RCIX 2009-11-01 08:34:24

Answer 12

+1 A:

The logic behind almost all of the previous answers is always the same: use set operations from mathematics to solve the problem.

A set in mathematics can contain each element only once. So the following list can’t be a set in the mathematical sense, since it contains one number (3) twice:

{ 1, 2, 3, 4, 3, 5 }

Since set operations, in particular checking whether an element already exists in a set, are common operations, most languages have data structures that implement these set operations efficiently. So we can simply fall back to this in our solution:

// Construct set from first list:
Set uniques = Set.from(list1);

// Iterate over second list, check each item’s existence in set.
for each (item in list2)
    if (not uniques.Contains(item))
        return item;

Different implementations of sets yield different performance, but this performance will always be superior to the naive solution (for large lists). In particular, two implementations exist:

As a (self-balanced) search tree. The tree has its elements sorted, thus looking up a particular element is efficient by using a binary search (or variant thereof). Lookup thus has a performance of O(log n). Creating the tree set from the input has performance O(n · log n). This is also the overall performance.
Hash tables can be implemented to have average-case look-up performance of O(1) (and with a few tricks, this can also be made the worst-case performance). Creating the hash table can be done in O(n). Therefore, hash tables can achieve an overall runtime of O(n).
Bloom filters offer a nice probabilistic solution – i.e. the solution may actually be wrong, but we can control how (un)likely this will be. It is particularly interesting because it’s very space-efficient.
… many other implementations exist.

In every case, the usage remains the same and the above pseudo-code gives a textbook solution to your problem. A Java implementation might look as follows:

// Construct set from first list:
Set<Integer> uniques = new HashSet<Integer>(list1);

// Iterate over second list, check each item’s existence in set.
for (int item : list2)
    if (! uniques.Contains(item))
        return item;

Notice how this looks almost exactly like the pseudo-code. Solutions in C#, C++ or other languages wouldn’t be much different.

EDIT Oops, I’ve just noticed that the requested return value is the pair of mismatching elements. However, this requirement doesn’t change the reasoning and almost doesn’t change the pseudo-code (do the same thing, with interchanged lists).

Konrad Rudolph 2009-11-01 21:45:35

Answer 13

A:

Make sets A, B of the arrays a,b respectively. A \ B gives you the additional integer in A. B \ A gives you the additional integer in B.

If either one of these operations returns an empty set, then the additional integer is in the array twice. You find this out when constructing the set: adding a duplicate integer to the set does not increase the set's size.

Ralph Rickenbach 2010-06-29 13:44:33

ansaurus

tags:

views:

answers:

Distinguishing extra element from two arrays?

related questions