views:

983

answers:

11

[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.

[Example]
<1 2 3 4>  = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>

[Limitation] The algorithm should require constant extra space, and O(n) running time.

+4  A: 

You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.

There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).

Larry
Always a chance of collision.
Seva Alekseyev
I'm not convinced that this is 100%, but hey, if you're on an interview, you can do worse, probably.
Larry
example of collision?
Timmy
@Timmy, this will surely sometimes have collisions, because the amount of different *arrays* is much greater than the amount of different *integers*.
Pavel Shved
Why do you say "high probability" instead of calculating it? The ideal probability that can't be overcome is (1 - 1/MAX_INT).
Pavel Shved
*With high probability* is an actual term used in probabilistic algorithms - http://www.algorithmist.com/index.php/With_high_probability
Larry
@Larry, it's hard use the wiki article you wrote as a trusted source ;-) Anyway, thanks for the info. :-)
Pavel Shved
Haha, I meant that a shorthand to writing it out, though that is also the first result in Google! ;)
Larry
And it seems like all the definitions are in papers. But generally, the term is used when you can make the probability as small as you want.
Larry
+2  A: 
  1. Insert all elements from the first array into a hashtable
  2. Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there

Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?

pajton
"Insert all elements from the first array into a hashtable" You need O(n) memory to allocate a hash table, while you are only given O(1).
Pavel Shved
@Pavel:no, you don't need O(n). O(n) would be that if the arrays were doubled in size (for example) you'd expect the hash table to also (approximately) double in size. That's not the case at all.
Jerry Coffin
@Pavel in worst case I need O(n), I stated that in my post. I am aware this is just partial solution, but maybe can be improved.
pajton
@Jerry, @pajton, you'll need O(n) in *any* case, provided all values in the array are different. When you add an element to a table, you *have to* store the exact value you've added (for future comparisons). It will be either a new item in a list (for bucket lists), or a requirement to have at least n buckets (open addressing). These all are O(n).
Pavel Shved
@pajton, taking no account of the cost of extra space, your solution is still wrong.Take the following as an example:<1 1 3 3>, <1 3 3 3>Can you say they're the same?
meta
@meta Right, but here is easy fix: introduce hashmap instead of hashtable and store count of every element with a hash. Still `O(n)`
pajton
+12  A: 
KennyTM
Only if they are bounded integers.
Larry
Given that we're talking about a computer here, they're always bounded. Granted, an integer that occupied (say) 16 gigabytes would have an extremely large upper bound, but it's still a bound.
Jerry Coffin
By that logic, you can say the naive "hashtable" solution is right, by the fact that you just set the hashtable at a constant, but huge, size.
Larry
@Larry: if you look at my answer, I did say exactly that.
Jerry Coffin
So the conclusion of this discussion is that it cannot be done
Christopher Oezbek
@Larry: The hash table solution is correct only if you can layer the table "in-place" with one of the arrays. (Thus use O(1) space). But the radix sort uses O(1) space as long as the integer word size is O(log N). Read the abstract.
KennyTM
Radix sort's running time is not `O(N)` for unbounded integers.
Larry
@Larry: That's true. Therefore the remark in the last line.
KennyTM
By that logic, on bounded integer, create a hash table of size MAX_INT. This is a constant, I'm sure of it! ;P
Larry
@Larry: That's a bit different. The integer range needs to grow super-polynomially with array size.
KennyTM
I didn't say it's efficient, just that it's `O(1)` extra space.
Larry
@Larry: I did not say about efficiency.
KennyTM
Then what are you saying?
Larry
@Larry: I mean only if the array is `[1,3,9,27,81,...]` then radix sort will fail.
KennyTM
A: 

For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.

Nagaraj
The space required is O(n). Does not fly.
Seva Alekseyev
Sorry, i think i jumped early
Nagaraj
+2  A: 

A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.

In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:

size_t counts[UCHAR_MAX];

Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.

Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.

Jerry Coffin
I do not agree with you. This way all algorithms operating in O(n) would be constant, because if you have an array of size n, but you know the n is expressed as integer, it has less than MAX_INT elements, therefore constant.
pajton
It's what make these interview questions academic, sometimes. `O(N)` at all cost isn't practical!
Larry
@pajton:not really comparable. You're talking about a limit on the size of the array being processed. I'm talking about a limit on the range of the items IN the array, regardless of its size.
Jerry Coffin
@Pajton Consider the answer that KennyTM gave. It links to a O(n) radix sort that only works if the range of the integers is limited. Otherwise sorting cannot work in O(n)
Christopher Oezbek
I see your point. But the author didn't say anything about the bound of elements in the array, so if you assume their range is bound it is the same as the length bound.
pajton
"a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity." - not necessarily, the alternative is to say that your big-O analyses of comparison sorts don't estimate time, but numbers of comparisons, copies, and time excluding them.
Steve Jessop
@Steve: I think we mostly agree here. I said: "nearly all" for a reason -- while it's certainly possible to be more explicit, it's only really common in places like textbooks and standards.
Jerry Coffin
True, and as long as people are commonly seen making absurd statements like, "no it isn't O(N^2), it's O(N)", there's probably not much point worrying too much how precise they're being as to what they're describing in the first place ;-)
Steve Jessop
A: 

Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.

If the integers are unranged, it cannot be done.

Christopher Oezbek
A: 

You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.

Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.

Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?

kgrad
No, it is not equal to the length of the array, it is equal to the number of distinct values in the array.
Svante
@Svante: whoops, true!
kgrad
A: 

given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):

// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
  accumulator = accumulator + a[i] - b[i]
  if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)

accumulator must be able to store integer with range = +- arraysize * n

Francesco R.
Your code is not working properly for [1,2,4] [2,3,2]
Victor Hurdugaci
A: 

I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.

Niki Yoshiuchi
I was thinking something along those lines, eg. proving that for subset A,B of Z\{0,1} where |A| = |B|, then if sum(a_i) = sum(b_i) and prod(a_i) = prod(b_i) we must have A = B after some permutation ... the hard part is the proof :P
Mads Ravn
Exactly! I tried using sum() and prod() but it failed- on several inputs, for example: [-10, -10, -10, -1, 9] != [6, -3, 5, -10, -10] but their sums and products are equal.
Niki Yoshiuchi
+1  A: 

I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.

I will be happy to be proven wrong, so that I can learn something new.

ArunSaha
Can you prove the lower bound for this problem?
meta
A: 

How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.

DevNull
No - that doesn't work: consider the two 2-element arrays:`a[2] = { 1, 3 };``b[2] = { 2, 0 };`Using your method a and b would be considered to be identical.
Paul R