ansaurus

Question

Determine whether or not there exist two elements in Set S whose sum is exactly x - correct solution?

Answer 1

+1 A:

Your analysis is correct, and yes you must sort the array or else binary search does not work.

Sean Owen 2010-01-31 14:30:35

Answer 2

+2 A:

Your solution seems fine. Yes you need to sort because its a pre requisite for binary search. You can make a slight modification to your logic as follows:

public static boolean test(int[] a, int val) 
{
    Arrays.sort(a);

    int i = 0;            // index of first element.
    int j = a.length - 1; // index of last element. 

    while(i<j)
    {
        // check if the sum of elements at index i and j equals val, if yes we are done.
        if(a[i]+a[j] == val)
            return true;
        // else if sum if more than val, decrease the sum.
        else if(a[i]+a[j] > val)
            j--;
        // else if sum is less than val, increase the sum.
        else
            i++;
    }
    // failed to find any such pair..return false. 
    return false;
}

codaddict 2010-01-31 14:31:49

Answer 3

+2 A:

I do think I have spotted a minor bug in your implementation, but testing should uncover that one quickly.

The approach looks valid, and will reach the desired performance. You might simplify it by replacing the iterative binary search with a scan through the array, in effect replacing the binary search by a linear search that resumes where the previous linear search left off:

int j = a.length - 1;
for (int i = 0; i < a.length; i++) {
    while (a[i] + a[j] > val) {
        j--;
    }
    if (a[i] + a[j] == val) {
        // heureka!
    }
}

This step is O(n). (Proving that is left as an exercise for you.) Of course, the entire algorithm still takes O(n log n) for the merge sort.

meriton 2010-01-31 14:32:17

is heureka a mix of heuristics and eureka ?

Valentin Rocher 2010-01-31 14:36:03

@Bishiboosh: No its the German transliteration of the Greek word. I didn't know the English transliteration dropped the H. The things you learn of stack overflow ... :-)

meriton 2010-01-31 14:47:21

Answer 4

+1 A:

This is correct; your algorithm will run in O(n lg n) time.
There is a better solution: your logic for calculating diff is incorrect. Regardless of whether a[i] is greater than or less than val, you still need diff to be val - a[i].

danben 2010-01-31 14:32:49

Answer 5

+2 A:

Here's an O(n) solution using a hash-set:

  public static boolean test(int[] a, int val) {
      Set<Integer> set = new HashSet<Integer>();

      // Look for val/2 in the array
      int c = 0;
      for(int n : a) {
        if(n*2 == val)
          ++c
      }
      if(c >= 2)
         return true; // Yes! - Found more than one

      // Now look pairs not including val/2
      set.addAll(Arrays.asList(a));
      for (int n : a) {
         if(n*2 == val)
            continue;
         if(set.contains(val - n))
            return true;
      }

      return false;
   }

Itay 2010-01-31 14:36:04

What if there are collisions during HashSet lookup? Their number must be bounded for the amortized lookup to be in O(1). What about the time to allocate the backing array? How do you know a O(n) backing array is enough?

meriton 2010-01-31 14:42:59

@meriton I don't follow you. For all practical purposes (and in particular in these type of questions) a hash-table lookup can be considered as O(1). If we were talking about exact time then the points you raise may be viable, but still, an O(n) algorithm will eventually beat O(nlogn) if n is large enough.

Itay 2010-01-31 14:47:19

For instance, assume hashCode(i) = i % size. I now insert n multiples of size into the set. The hashSet has degenerated to a linear list.

meriton 2010-01-31 14:53:02

@meriton: your comment makes no sense. Itay's solution is O(n), just as mine.

Webinator 2010-01-31 14:53:34

hashCode(i) == i

Itay 2010-01-31 14:56:54

And what if I now replied that your comment makes no sense either? Do you think this discussion would get anywhere? To clarify: The proof that a HashSet offers O(1) lookup assumes a good distribution of hash values, which depends on your input. If all numbers in the input get the same bucket in the hashSet, the HashSet will be pretty slow (java.util.HashSet uses a LinkedList to hold the contents of a bucket). Usually, your input will distribute well, but not always. Hence, a hashSet does not guarantee a constant worst-case lookup complexity.

meriton 2010-01-31 15:06:21

@Itay: true, `hashCode(i)==i`, but HashSet doesn't use 2^32 buckets, it uses fewer than that. So the real hash function used is a modulus over the number of buckets. AFAIK the Java HashSet re-hashing strategy doesn't guarantee a worst-case O(1) elements per bucket (except in the trivial sense that almost anything done on bounded-size integers is O(1)), since it's based on load factor alone, not on the occupancy of the worst bucket in the table.

Steve Jessop 2010-01-31 15:34:18

@OldEnthusiast: meritor is right, this is **not** `O(n)`, because Hash-table lookups are not worst-case `O(1)` (which means they are not worst-case `Θ(1)`), they are average-case `Θ(k/n)` (see http://en.wikipedia.org/wiki/Hash_table#Performance_analysis). When a problem says *"find an O(something)-time algorithm,"* it always means worst-case. However, I will not give a -1, since, though it is not the answer to this particular problem, it *is* the solution I would use if I encountered this problem in the real-world.

BlueRaja - Danny Pflughoeft 2010-01-31 16:31:46

@BlueRaja: "When a problem says "find an O(something)-time algorithm," it always means worst-case." - this is false and represents a poor understanding of big-O notation. Big-O has everything to do with the growth rate of a particular function and nothing to do with best/worst/average case. A function can very well be O(n) in the average case and O(n^2) in the worst case.

danben 2010-01-31 17:28:51

@danben: I understand that; but when they don't state *"worst-case,"* *"average-case,"* or *"best-case"* in a problem for an Algorithms class, it's always implicitly understood that the problem is asking for *"worst-case."* This answer is very good in the average case, but not in the worst-case, which is where the confusion comes from, because the OP did not explicitly say one or the other.

BlueRaja - Danny Pflughoeft 2010-01-31 17:37:04

@danben: presumably if Itay is allowed to assume average case instead of worst case, then I'm allowed to assume best case instead of worst case, and present an algorithm which is Theta(n) best case but Theta(n^2) average and worst case? The reason worst case should be assumed unless stated otherwise is precisely to bar students from this kind of hijinks.

Steve Jessop 2010-01-31 23:18:19

Answer 6

+3 A:

There's another very fast solution: Imagine you have to solve this problem in Java for about 1 billions integers. You know that in Java integers go from -2*31+1 to +2*31.

Create an array with 2**32 billion bit (500 MB, trivial to do on today's hardware).

Iterate over your set: if you have an integer, set corresponding bit to 1.

O(n) so far.

Iterate again over your set: for each value, check if you have a bit set at "current val - x".

If you have one, you return true.

Granted, it needs 500 MB of memory.

But this shall run around any other O(n log n) solution if you have, say, to solve that problem with 1 billion integers.

O(n).

Webinator 2010-01-31 14:48:27

So you allocate, and zero-out, 500 MB of memory, to solve this problem for n=10000?

meriton 2010-01-31 15:01:28

I quote from this answer: "Imagine you have to solve this problem in Java for about 1 billions integers". I think it's pretty clear that when OldEnthusiast says "very fast", he means very fast for large problems, i.e. low complexity. This is not the most efficient solution for n=1: `return false;` is ;-)

Steve Jessop 2010-01-31 15:43:07

+1 poster asked for Θ(n log n) time, this trumps even that; he said nothing about space constraints :)

BlueRaja - Danny Pflughoeft 2010-01-31 16:25:50

-1. This does not work for all input and for small input, the hidden constant is huge.

Moron 2010-01-31 18:10:31

@Moron: what input does this fail for? That the questioner's code works for, I mean: there's the bug where if x is equal to twice the value of a number in the set, then you can get a false positive, but assuming that "set" in the question means no duplicates, the fix is the same in both cases and just requires checking the special case.

Steve Jessop 2010-01-31 23:06:01

@Steve Jessop.32 bit integers are assumed. I don't see that stated anywhere in the problem statement. Do you? 64 bit machines are common these days.

Moron 2010-02-01 00:32:06

@Moron: The problem also doesn't say anything about memory constraints, so search the list for the largest `n` then create an array of that size and run this algorithm. That is `O(k)` (where k is the largest value), due to the zeroing of the array. Algorithms such as this are known as pseudo-polynomial (http://en.wikipedia.org/wiki/Pseudo-polynomial_time), and while helpful in some *(many)* cases in the real world, it is not a solution to this problem as `k` is unrelated to `n`.

BlueRaja - Danny Pflughoeft 2010-02-01 02:33:11

@BlueRaja, the problem does not talk about the memory constraints. It does not talk about a model of computation on which to base the time complexity etc, either. Do you really want to go into that discussion?

Moron 2010-02-01 03:44:12

@Moron: I am agreeing with you why are you arguing with me

BlueRaja - Danny Pflughoeft 2010-02-01 03:47:10

@BlueRaja: That was not an argument. Anyway, sorry I misunderstood you.

Moron 2010-02-01 04:31:34

@Moron: The solution given is in Java, so int is 32bit regardless of the word size of the machine. Granted, this solution does not work in all programming languages or for all possible formats of the input integers, but I asked for inputs where this fails *and the questioner's code works*. Of course possibly you would have voted down the questioner's code on the same basis (doesn't handle 64bit input), had it been offered as an answer :-)

Steve Jessop 2010-02-01 10:29:52

Nothings stops you from having a strategy at the beginning where you use the HashSet O(n) solution for small samples and the 500 MB array solution for huge samples. Note that the original question specifically passes an int[] containing the "set of integers" to his O(n log n) solution.

Webinator 2010-02-01 13:52:11

@Steve Jessop: If/when Java moves to 64 bits, the questioner has to do nothing, but this 500MB solution has to change and the memory usage will become impossible to handle (2^64 bits of memory is probably more than the atoms in the galaxy etc etc). In other words, this does not scale.@OldEnthusiast: The basic idea of your algorithm is still unscaleable.

Moron 2010-02-01 15:17:36

If anyone is carefully writing their Java code against the possibility that `int` will someday be 64 bits, I'll happily point and laugh at them.

Steve Jessop 2010-02-01 17:56:24

You have it backwards. Read again. Anyway, I am not interested in discussing this further. I have stated my case.

Moron 2010-02-02 01:40:14

Answer 7

+1 A:

A simple solution is, after sorting, move pointers down from both ends of the array, looking for pairs that sum to x. If the sum is too high, decrement the right pointer. If too low, increment the left one. If the pointers cross, the answer is no.

Neal Gafter 2010-02-01 21:53:27

ansaurus

tags:

views:

answers:

Determine whether or not there exist two elements in Set S whose sum is exactly x - correct solution?

related questions