views:

721

answers:

4

Hi,

Design an algorithm to find all pairs of integers within an array which sum to a specified value ?

It was not an assignment question. I had tried this problem using Hash table to store entry for sum of array elements but it is not the efficient solution and so want to know what are thoughts of other stackoverflow readers on it.

Thanks.

+2  A: 

How about sorting the array, then marching in from both ends?

Beta
That's exactly where I was heading. Keeps number of operations to a minimum.
Bomlin
The sorting of the array itself can furher be optimized, by keeping a tab of the smallest and biggest number during the initial pass and calculating a maximum (and possibly a minimum) threshold that can be used to eliminate numbers that are too big (and possibly too small) during the next iteration.
mjv
You can discard numbers that are too big only if you assume that the integers are non-negative. And how can you discard a number for being too small?
Beta
I agree with @Beta. It gets too complicated if you have negative numbers. Which was not mentioned in the question. That's why I resorted to a full sort and binary search.
Ayman
+2  A: 

Assume required sum = R

  1. sort the array
  2. for each number in the array A(n), do a binary search to find the number A(x) such that A(n) + A(x) = R
Ayman
No need to search - you can do this much more efficiently than that.
Nick Johnson
Efficiently in what sense? The sort is O(n log n), the cost of n binary searches is only O(n log n). Optimizing the searches is therefore pretty pointless. Do you mean you have an answer better than O(n log n) overall?
chrispy
Isn't binary search O(log n)?
Ayman
If the array is already sorted, your algorithm is O( n log n ), but O(n) is possible.
leiz
Indeed, the sort is O(n log n), but the search can be O(n) by moving in from each end of the array. The overall complexity is still O(n log n), but that doesn't mean that the latter solution isn't strictly more efficient.
Nick Johnson
A: 

I don't see why the hash table approach is inefficient, at least in algorithm analysis terms - in memory locality terms admittedly, it can be quite bad. Anyway, scan the array twice...

First scan - put all the array elements in the hash table - amortized O(n) total.

Second scan - check for (sum - current) in the hash table - O(n) total.

This beats the O(n log n) sort-and-search methods, at least in theory.

Then, note that you can combine the two scans into one. You can spot a pair as soon as you encounter the second of that pair during the first scan. In pseudocode...

for i in array.range
  hashset.insert (array [i])

  diff = sum - array [i]
  if hashset.includes (diff)
    output diff, array [i]

If you need positions of the items, use a hashmap and store item positions in it. If you need to cope with duplicates, you might need to store counts in a hashmap. For positions and duplicates, you might need a hashmap of start pointers for linked lists of positions.

This makes assumptions about the hash table implementation, but fairly safe ones given the usual implementations in most current languages and libraries.

BTW - combining the scans shouldn't be seen as an optimisation. The iteration overhead should be insignificant. Memory locality issues could make a single pass slightly more efficient for very large arrays, but the real memory locality issues will be in the hashtable lookups anyway.

IMO the only real reason to combine the scans is because you only want each pair reported once - handling that in a two-scan approach would be a bit more hassle.

Steve314
A: 

If you don't mind spending O(M) in space, where M is the sum you are seeking, you can do this in O(N + M) time. Set sums[i] = 1 when i <= M on a single pass over N, then check (sums[i] && sums[M-i]) on a single pass over M/2.

Jeff Williams