views:

216

answers:

4

I am looking for the solution of following algorithm with minimal time and space complexity.

Given two arrays a and b, find all pairs of elements (a1,b1) such that a1 belongs to Array A and b1 belongs to Array B whose sum a1+b1 = k (any integer).

I was able to come up with O(n log n) approach where we will sort one of the array say A and for each of the element b in array B, do binary search on sorted array A for value (K-b) .

Can we improve it any further?

+4  A: 

If you have a limit on the maximum possible number (let's name it M) then you can have a solution in O(M+n).

Boolean array of false and mark as true all value for element of A. Then for each element b of B check if the element number K-b is marked as true.

You can improve it if you are using an hash-map instead of a big array. But I would consider that in this kind of questions hash-map is kind of cheating.

Anyway if would give you O(n) for insertion and then O(n) for query, O(n) in total.

EDIT :

One case where this might be useful.

  • You have un-sorted vectors of size 10^6, so sorting them and doing the match is in O(n log n) with n = 10^6.
  • You need to do this operation 10^6 times (different vectors), complexity of O(n*n*log n).
  • Maximum value is 10^9.

Using my idea with not with boolean but integer (incremented at each run) gives you a complexity of :

  • "O(10^9)" to create the array (also same complexity of space)
  • O(n) at each run, so O(n*n) for the total.

You are using more space but you've increased speed by a factor log(n) ~=20 in this case !

Loïc Février
+1 for the first solution which will work . But space complexity is a function of M ( maximum possible number ) and not n , allocating this much space in real world is not acceptable.
TopCoder
Yes, I know. But it depends of what M is. If all the values are between 1 and 1000 but you have 1000 000 of them in each array (un-sorted) then using it can be efficient. Depends of the problem ;)
Loïc Février
Of course for the same cost you can sort the arrays...
Loïc Février
Why would you consider hashtable - cheating? You can write ad hoc implementation in 10 minutes (if no standard available) and get linear time and space complexity. Seems like the optimal solution. Anyway, +1 for mentioning it.
Nikita Rybak
+1 for hash map solution.
Mark Byers
+8  A: 

If the arrays are sorted you can do it in linear time and constant storage.

  • Start with two pointers, one pointing at the smallest element of A, the other pointing to the largest element of B.
  • Calculate the sum of the pointed to elements.
  • If it is smaller than k increment the pointer into A so that it points to the next largest element.
  • If it is larger than k decrement the pointer into B so that it points to the next smallest element.
  • If it is exactly k you've found a pair. Move one of the pointers and keep going to find the next pair.

If the arrays are initially unsorted then you can first sort them then use the above algorithm. There a few different approaches for sorting them that you could use, depending on the type of data you expect:

A comparison sort will require O(n log n) time on average. The last two are faster than O(n log(n)) but can be impractical if the range of possible values in the input arrays is very large.

Mark Byers
Does that find all pairs, or just the first?
Platinum Azure
@Platinum Azure: I believe that finds all unique pairs.
Mark Byers
Once you have find one, you continue by increasing the pointer in A and the process continues until you've found an other pair.
Loïc Février
I had this solution in mind but even with two pointers i cannot avoid iterating over both the arrays completely , as i need all pairs for which condition hold true .Am i correct ?
TopCoder
If it is sorted then each pointer will always go forward (A) or backward (B) : 2*n moves maximum, so it's linear.
Loïc Février
@Loic .. got it . Thanks!
TopCoder
@Mark Byers: Thanks for making it more clear with your edit. I suppose I just misunderstood (in the first version you never specified what to do after finding the pair, and I assumed print and terminate since you didn't specify explicit instructions). My apologies. I didn't downvote you, so no hard feelings I hope.
Platinum Azure
+1  A: 
template< typename T >
std::vector< std::pair< T, T > >
find_pairs( 
    std::vector< T > const & a, std::vector< T > const & b, T const & k  ) {

    std::vector< std::pair< T, T > > matches;

    std::sort( a.begin(), a.end() );  // O( A * lg A )
    std::sort( b.begin(), b.end() );  // O( B * lg B )

    typename std::vector< T >::const_iterator acit = a.begin();
    typename std::vector< T >::const_reverse_iterator bcit = b.rbegin();

    for( ; acit != a.end(); /* inside */ ) {
        for( ; bcit != b.rend(); /* inside */ ) {

            const T sum = *acit + *bcit;

            if( sum == k ) {
                matches.push_back( std::pair< T, T >( *acit, *bcit ) );
                ++acit;
                ++bcit;
            }
            else if( sum < k ) {
                ++acit;
            }
            else {  // sum > k
                ++bcit;
            }
        }
    }  // O( A + B )
    return matches;
}
ArunSaha
+1  A: 

I would create a hash table containing the elements of one array, then iterate the other array looking up k - a(n), generating an output element if the lookup succeeded. This will use O(n) space and time.

In C#, it might look like this:

var bSet = new HashSet(B);
var results = from a in A
              let b = k - a
              where bSet.Contains(b)
              select new { a, b };
Gabe