tags:

views:

623

answers:

7

Taken from Introduction to Algorithms

Describe a Θ(n lg n)-time algorithm that, given a set S of n integers and another integer x, determines whether or not there exist two elements in S whose sum is exactly x.

This is my best solution implemented in Java so far:

    public static boolean test(int[] a, int val) {
    mergeSort(a);

    for (int i = 0; i < a.length - 1; ++i) {
        int diff = (val >= a[i]) ? val - a[i] : a[i] - val;

        if (Arrays.binarySearch(a, i, a.length, diff) >= 0) {
            return true;
        }
    }

    return false;
}

Now my 1st question is: Is this a correct solution? From my understanding, mergeSort should perform the sort in O(n lg n), the loop should take O(n lg n) (n for the iteration multiplied by O(lg n) for the binary search, resulting in O(2n lg n), so it should be correct.

My 2nd question is: Are there any better solutions? Is sorting the array essential?

+1  A: 

Your analysis is correct, and yes you must sort the array or else binary search does not work.

Sean Owen
+2  A: 

Your solution seems fine. Yes you need to sort because its a pre requisite for binary search. You can make a slight modification to your logic as follows:

public static boolean test(int[] a, int val) 
{
    Arrays.sort(a);

    int i = 0;            // index of first element.
    int j = a.length - 1; // index of last element. 

    while(i<j)
    {
        // check if the sum of elements at index i and j equals val, if yes we are done.
        if(a[i]+a[j] == val)
            return true;
        // else if sum if more than val, decrease the sum.
        else if(a[i]+a[j] > val)
            j--;
        // else if sum is less than val, increase the sum.
        else
            i++;
    }
    // failed to find any such pair..return false. 
    return false;
}
codaddict
+2  A: 

I do think I have spotted a minor bug in your implementation, but testing should uncover that one quickly.

The approach looks valid, and will reach the desired performance. You might simplify it by replacing the iterative binary search with a scan through the array, in effect replacing the binary search by a linear search that resumes where the previous linear search left off:

int j = a.length - 1;
for (int i = 0; i < a.length; i++) {
    while (a[i] + a[j] > val) {
        j--;
    }
    if (a[i] + a[j] == val) {
        // heureka!
    }
}

This step is O(n). (Proving that is left as an exercise for you.) Of course, the entire algorithm still takes O(n log n) for the merge sort.

meriton
is heureka a mix of heuristics and eureka ?
Valentin Rocher
@Bishiboosh: No its the German transliteration of the Greek word. I didn't know the English transliteration dropped the H. The things you learn of stack overflow ... :-)
meriton
+1  A: 
  1. This is correct; your algorithm will run in O(n lg n) time.

  2. There is a better solution: your logic for calculating diff is incorrect. Regardless of whether a[i] is greater than or less than val, you still need diff to be val - a[i].

danben
+2  A: 

Here's an O(n) solution using a hash-set:

  public static boolean test(int[] a, int val) {
      Set<Integer> set = new HashSet<Integer>();

      // Look for val/2 in the array
      int c = 0;
      for(int n : a) {
        if(n*2 == val)
          ++c
      }
      if(c >= 2)
         return true; // Yes! - Found more than one

      // Now look pairs not including val/2
      set.addAll(Arrays.asList(a));
      for (int n : a) {
         if(n*2 == val)
            continue;
         if(set.contains(val - n))
            return true;
      }

      return false;
   }
Itay
What if there are collisions during HashSet lookup? Their number must be bounded for the amortized lookup to be in O(1). What about the time to allocate the backing array? How do you know a O(n) backing array is enough?
meriton
@meriton I don't follow you. For all practical purposes (and in particular in these type of questions) a hash-table lookup can be considered as O(1). If we were talking about exact time then the points you raise may be viable, but still, an O(n) algorithm will eventually beat O(nlogn) if n is large enough.
Itay
For instance, assume hashCode(i) = i % size. I now insert n multiples of size into the set. The hashSet has degenerated to a linear list.
meriton
@meriton: your comment makes no sense. Itay's solution is O(n), just as mine.
Webinator
hashCode(i) == i
Itay
And what if I now replied that your comment makes no sense either? Do you think this discussion would get anywhere? To clarify: The proof that a HashSet offers O(1) lookup assumes a good distribution of hash values, which depends on your input. If all numbers in the input get the same bucket in the hashSet, the HashSet will be pretty slow (java.util.HashSet uses a LinkedList to hold the contents of a bucket). Usually, your input will distribute well, but not always. Hence, a hashSet does not guarantee a constant worst-case lookup complexity.
meriton
@Itay: true, `hashCode(i)==i`, but HashSet doesn't use 2^32 buckets, it uses fewer than that. So the real hash function used is a modulus over the number of buckets. AFAIK the Java HashSet re-hashing strategy doesn't guarantee a worst-case O(1) elements per bucket (except in the trivial sense that almost anything done on bounded-size integers is O(1)), since it's based on load factor alone, not on the occupancy of the worst bucket in the table.
Steve Jessop
@OldEnthusiast: meritor is right, this is **not** `O(n)`, because Hash-table lookups are not worst-case `O(1)` (which means they are not worst-case `Θ(1)`), they are average-case `Θ(k/n)` (see http://en.wikipedia.org/wiki/Hash_table#Performance_analysis). When a problem says *"find an O(something)-time algorithm,"* it always means worst-case. However, I will not give a -1, since, though it is not the answer to this particular problem, it *is* the solution I would use if I encountered this problem in the real-world.
BlueRaja - Danny Pflughoeft
@BlueRaja: "When a problem says "find an O(something)-time algorithm," it always means worst-case." - this is false and represents a poor understanding of big-O notation. Big-O has everything to do with the growth rate of a particular function and nothing to do with best/worst/average case. A function can very well be O(n) in the average case and O(n^2) in the worst case.
danben
@danben: I understand that; but when they don't state *"worst-case,"* *"average-case,"* or *"best-case"* in a problem for an Algorithms class, it's always implicitly understood that the problem is asking for *"worst-case."* This answer is very good in the average case, but not in the worst-case, which is where the confusion comes from, because the OP did not explicitly say one or the other.
BlueRaja - Danny Pflughoeft
@danben: presumably if Itay is allowed to assume average case instead of worst case, then I'm allowed to assume best case instead of worst case, and present an algorithm which is Theta(n) best case but Theta(n^2) average and worst case? The reason worst case should be assumed unless stated otherwise is precisely to bar students from this kind of hijinks.
Steve Jessop
+3  A: 

There's another very fast solution: Imagine you have to solve this problem in Java for about 1 billions integers. You know that in Java integers go from -2*31+1 to +2*31.

Create an array with 2**32 billion bit (500 MB, trivial to do on today's hardware).

Iterate over your set: if you have an integer, set corresponding bit to 1.

O(n) so far.

Iterate again over your set: for each value, check if you have a bit set at "current val - x".

If you have one, you return true.

Granted, it needs 500 MB of memory.

But this shall run around any other O(n log n) solution if you have, say, to solve that problem with 1 billion integers.

O(n).

Webinator
So you allocate, and zero-out, 500 MB of memory, to solve this problem for n=10000?
meriton
I quote from this answer: "Imagine you have to solve this problem in Java for about 1 billions integers". I think it's pretty clear that when OldEnthusiast says "very fast", he means very fast for large problems, i.e. low complexity. This is not the most efficient solution for n=1: `return false;` is ;-)
Steve Jessop
+1 poster asked for Θ(n log n) time, this trumps even that; he said nothing about space constraints :)
BlueRaja - Danny Pflughoeft
-1. This does not work for all input and for small input, the hidden constant is huge.
Moron
@Moron: what input does this fail for? That the questioner's code works for, I mean: there's the bug where if x is equal to twice the value of a number in the set, then you can get a false positive, but assuming that "set" in the question means no duplicates, the fix is the same in both cases and just requires checking the special case.
Steve Jessop
@Steve Jessop.32 bit integers are assumed. I don't see that stated anywhere in the problem statement. Do you? 64 bit machines are common these days.
Moron
@Moron: The problem also doesn't say anything about memory constraints, so search the list for the largest `n` then create an array of that size and run this algorithm. That is `O(k)` (where k is the largest value), due to the zeroing of the array. Algorithms such as this are known as pseudo-polynomial (http://en.wikipedia.org/wiki/Pseudo-polynomial_time), and while helpful in some *(many)* cases in the real world, it is not a solution to this problem as `k` is unrelated to `n`.
BlueRaja - Danny Pflughoeft
@BlueRaja, the problem does not talk about the memory constraints. It does not talk about a model of computation on which to base the time complexity etc, either. Do you really want to go into that discussion?
Moron
@Moron: I am agreeing with you why are you arguing with me
BlueRaja - Danny Pflughoeft
@BlueRaja: That was not an argument. Anyway, sorry I misunderstood you.
Moron
@Moron: The solution given is in Java, so int is 32bit regardless of the word size of the machine. Granted, this solution does not work in all programming languages or for all possible formats of the input integers, but I asked for inputs where this fails *and the questioner's code works*. Of course possibly you would have voted down the questioner's code on the same basis (doesn't handle 64bit input), had it been offered as an answer :-)
Steve Jessop
Nothings stops you from having a strategy at the beginning where you use the HashSet O(n) solution for small samples and the 500 MB array solution for huge samples. Note that the original question specifically passes an int[] containing the "set of integers" to his O(n log n) solution.
Webinator
@Steve Jessop: If/when Java moves to 64 bits, the questioner has to do nothing, but this 500MB solution has to change and the memory usage will become impossible to handle (2^64 bits of memory is probably more than the atoms in the galaxy etc etc). In other words, this does not scale.@OldEnthusiast: The basic idea of your algorithm is still unscaleable.
Moron
If anyone is carefully writing their Java code against the possibility that `int` will someday be 64 bits, I'll happily point and laugh at them.
Steve Jessop
You have it backwards. Read again. Anyway, I am not interested in discussing this further. I have stated my case.
Moron
+1  A: 

A simple solution is, after sorting, move pointers down from both ends of the array, looking for pairs that sum to x. If the sum is too high, decrement the right pointer. If too low, increment the left one. If the pointers cross, the answer is no.

Neal Gafter