tags:

views:

1289

answers:

12

Hi,

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...

The question is: we are given a SORTED array, which consists of nos. which occur EVEN no. of times, except one, which occurs ODD no. of times. We need to find the solution in log n time.

Well it is easier to find the solution in O(n) time. but in log n time, it's just looking pretty trickier.

Any suggestion would be appreciated. Thanks.

+13  A: 

A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).

PROOF OF O((log N)²)

  T(2) = 1 //to make the summation nice
  T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements

For some N=2^k,

T(2^k) = (log 2^k) + T(2^(k-1))
       = (log 2^k) + (log 2^(k-1)) + T(2^(k-2))
       = (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1
       = k + (k-1) + (k-2) + ... + 1
       = k(k+1)/2
       = (k² + k)/2
       = (log(N)² + log(N))/ 2
       = O(log(N)²)
Nabb
Very nice answer
jdelator
Nitpick: an O(log N) algorithm is also an O(log<sup>2</sup> N) algorithm. The Ω is more interesting here.
Matthieu M.
@Matthieu... Nitpick the nitpick: The Theta is even more interesting :-)
Moron
@Moron: Yes I guess so :p However the current discussion seemed to evolve into proving it was impossible to do it in less than log2 N. I still don't see how it can be proved though, sorting algorithms proved that given some constraints you could get better asymptotic performances that the common models suggested, so I wonder if it could be applied here, somehow.
Matthieu M.
On the other hand, while an input with 0s, 1s and 2s is tailor-made for an O(n) counting sort, binary search is still worst-case optimal at Θ(log n) queries.
This is a nice answer, but the question asks for an O(log n) solution and this does not achieve it — only O((log n)^2), which many others also got. So why is this the highest voted answer? :p
ShreevatsaR
@ShreevatsaR: I agree, why is the less optimal algorithm getting the upvotes here?
Dean J
Well, this is still a good answer, with a concise explanation of the best known algorithm and a complete analysis. Given the proof contained in the answer that is now thankfully highest-voted, this algorithm is optimal, and deserves upvotes. :-)
ShreevatsaR
+1  A: 

AHhh. There is an answer.

Do a binary search and as you search, for each value, move backwards until you find the first entry with that same value. If its index is even, it is before the oddball, so move to the right.
If its array index is odd, it is after the oddball, so move to the left.

In pseudocode (this is the general idea, not tested...):

    private static int FindOddBall(int[] ary)
    {
        int l = 0,
            r = ary.Length - 1;
        int n = (l+r)/2;
        while (r > l+2)
        {
            n = (l + r) / 2;
            while (ary[n] == ary[n-1])
                n = FindBreakIndex(ary, l, n);
            if (n % 2 == 0) // even index we are on or to the left of the oddball 
                l = n;
            else            // odd index we are to the right of the oddball
                r = n-1;
        }
        return ary[l];
    }
    private static int FindBreakIndex(int[] ary, int l, int n)
    {
        var t = ary[n];
        var r = n;
        while(ary[n] != t || ary[n] == ary[n-1])
            if(ary[n] == t)
            {
                r = n;
                n = (l + r)/2;
            }
            else
            {
                l = n;
                n = (l + r)/2;
            }
        return n;
    }
Charles Bretana
Looks at least `O(n)` to me.
IVlad
I don't think so, as this algorithm is based on a binary search... doubling the size of the array would not double the number of iterations, it would only increase it by 1... As I understand it, that's O(n)
Charles Bretana
@Charles Breatana - doubling the size of the array **would** double the number of iterations of your algorithm. Consider an array of `n - 1` equal numbers and 1 other number. Your innermost while loop is `O(n)` because it will iterate over `n / 2` elements.
IVlad
Yr example is a special case, which can be corrected by changing the inner while loop to do the same thing. (see EDIT). In general, Each outer iteration eliminates half the array, the half on the 'wrong' side of the element being examined, so no, it would not...
Charles Bretana
@Charles Bretana - special case or not, it makes your current algorithm `O(n)`. I don't see any edit, but if we're thinking about the same thing it will still not be `O(log n)` but `O((log n)^2)`.
IVlad
Why the cryptic variable names?
Svante
@IVlad, You may be right about `O((log n) ^2)` In general, nested binary iterations would give you `O((log n) ^2)`, as you said, but this scenario, I think. may be different, because here they are not totally independant nested iterations. Neither the inner nor the outer iteration must iterate over every element, only those elements not eliminated by the other iteration... This will definitely reduce the number of processing steps required and may mitigate the "O-ness" of the algorithm (although I must confess that I'm not sure exactly how to calculate that.)
Charles Bretana
@Svante, sorry, I reedited to use simpler variable names...
Charles Bretana
@Charles - they actually are totally independent, why do you say they're not? I'm not sure what you mean by "iterate over every element" - they're binary searches, of course they don't iterate over every element. The problem is that you have an inner search that binary searches a range that was binary searched by your outer search (weird sentence, I know). I'm sure the algorithm is very fast in practice, but strictly speaking it **is** `O((log n)^2)` because the recurrence is `T(n) = log n + T(n / 2)` which is easily shown to be log square.
IVlad
@IVLad, I'll defer to yr judgment, as I'm not sure myself how to calculate the O=ness... what I meant, however, is perhaps best illustrated by yr example (array of n - 1 equal numbers and 1 other number) in this case my outer iteration would only process once, (maybe twice) and the inner iteration would handle the rest of the work.. whereas for n/2 pairs of different numbers and one single number, the inner search would process only once per pair, and the pouter iteration would handle the bulk of the work...
Charles Bretana
@VLad, How to measure this ? Run it with array of 1000 elements, 10,000 elements, and then with 100,000, and compare the results?
Charles Bretana
@Charles - I think it's going to be hard to determine if it's log square or log, especially if you have a fast processor. You could try comparing your algorithm with a binary search for a million or even 10 million elements, preferably on a slower computer. That might tell us something. Make sure to run them multiple times and take the average.
IVlad
@VLad, I will try that... Do you see what I mean? As the algorithm searches, each portion, segment, (or "piece") of the array is processed and eliminated either by the outer iteration or by the inner one, but never both...
Charles Bretana
@Charles - I see, but I'm not sure I agree :). It would be nice if others shared their thoughts.
IVlad
@VLad, i couldn't ask for more... frankly, till i test this, i'm not sure either... but appreciate that you understand the point...
Charles Bretana
+5  A: 

Start at the middle of the array and walk backward until you get to a value that's different from the one at the center. Check whether the number above that boundary is at an odd or even index. If it's odd, then the number occurring an odd number of times is to the left, so repeat your search between the beginning and the boundary you found. If it's even, then the number occurring an odd number of times must be later in the array, so repeat the search in the right half.

As stated, this has both a logarithmic and a linear component. If you want to keep the whole thing logarithmic, instead of just walking backward through the array to a different value, you want to use a binary search instead. Unless you expect many repetitions of the same numbers, the binary search may not be worthwhile though.

Jerry Coffin
Isn't this `O((log n)^2)` since you are repeating the binary search `log n` times in the worst case?
IVlad
@IVlad: not really -- after each search, you discard (at least) half the array, so in each subsequent search, the amount you're searching also decreases logarithmically.
Jerry Coffin
`(log n)+(log n/2)+(log n/4)+... = log(n^(log n)*2^(-(log n)((log n)-1)/2)) = (log n)²-(log n)((log n)-1)/2 = (log n)²/2+(log n)/2 = O((log n)²)`. For WolframAlpha: "sum log_2(2^k/2^i) for i=0..k-1".
Nabb
@Jerry Coffin - yes, but in order to discard that half you have to run another binary search, which makes it log square. I think posting some pseudocode will make this clearer.
IVlad
+8  A: 

Look at the middle element of the array. With a couple of appropriate binary searches, you can find the first and its last appearance in the array. E.g., if the middle element is 'a', you need to find i and j as shown below:

[* * * * a a a a * * *]
         ^     ^ 
         |     |
         |     |
         i     j

Is j - i an even number? You are done! Otherwise (and this is the key here), the question to ask is i an even or an odd number? Do you see what this piece of knowledge implies? Then the rest is easy.

Dimitris Andreou
Ah, I see Jerry gave it away! :)
Dimitris Andreou
If j-i is even, then you have to repeat your algorithm with half the array. The *i*th iteration takes (log N) - i steps so you have O ( (log N) ^ 2 ) steps.
Pete Kirkham
Not really. At the ith iteration, the two binary searches only need log(N / 2^i) - at each step, the array portion against which I have to do the two binary searches at least halves.
Dimitris Andreou
@Dimitris - you can almost half your number of binary searches by only searching for one bound on each iteration. Half the time you'll find the odd count on the larger side, but you don't need to worry about unbalanced bisection. To get that, you need a large number of repeats of the same value, which is handled pretty quickly when it spans the centre of the current range, so it's not a worst case.
Steve314
@Dimitris log(N / 2^i) = log(N) - i, so it **is** O(log(n)^2)
Rotsor
oops, yes, you are right, O(log(n)^2) indeed, sloppy of me.
Dimitris Andreou
@Steve314, also right. Though I don't see this affecting the big oh analysis. Still wondering what the O(logn) solution would be (without assumptions on the input).
Dimitris Andreou
+3  A: 

I have an algorithm which works in log(N/C)*log(K), where K is the length of maximum same-value range, and C is the length of range being searched for.

The main difference of this algorithm from most posted before is that it takes advantage of the case where all same-value ranges are short. It finds boundaries not by binary-searching the entire array, but by first quickly finding a rough estimate by jumping back by 1, 2, 4, 8, ... (log(K) iterations) steps, and then binary-searching the resulting range (log(K) again).

The algorithm is as follows (written in C#):

// Finds the start of the range of equal numbers containing the index "index", 
// which is assumed to be inside the array
// 
// Complexity is O(log(K)) with K being the length of range
static int findRangeStart (int[] arr, int index)
{
    int candidate = index;
    int value = arr[index];
    int step = 1;

    // find the boundary for binary search:
    while(candidate>=0 && arr[candidate] == value)
    {
        candidate -= step;
        step *= 2;
    }

    // binary search:
    int a = Math.Max(0,candidate);
    int b = candidate+step/2;
    while(a+1!=b)
    {
        int c = (a+b)/2;
        if(arr[c] == value)
            b = c;
        else
            a = c;
    }
    return b;
}

// Finds the index after the only "odd" range of equal numbers in the array.
// The result should be in the range (start; end]
// The "end" is considered to always be the end of some equal number range.
static int search(int[] arr, int start, int end)
{
    if(arr[start] == arr[end-1])
        return end;

    int middle = (start+end)/2;

    int rangeStart = findRangeStart(arr,middle);

    if((rangeStart & 1) == 0)
        return search(arr, middle, end);
    return search(arr, start, rangeStart);
}

// Finds the index after the only "odd" range of equal numbers in the array
static int search(int[] arr)
{
    return search(arr, 0, arr.Length);
}
Rotsor
I didn't read your code entirely, but +1 for you seem the first one that observe that `K` has to be part of the complexity of the problem. Maybe you could add a textual description of your ideas?
Jens Gustedt
@Jens Gustedt, Added something...
Rotsor
A: 

We don't have any information about the distribution of lenghts inside the array, and of the array as a whole, right?

So the arraylength might be 1, 11, 101, 1001 or something, 1 at least with no upper bound, and must contain at least 1 type of elements ('number') up to (length-1)/2 + 1 elements, for total sizes of 1, 11, 101: 1, 1 to 6, 1 to 51 elements and so on.

Shall we assume every possible size of equal probability? This would lead to a middle length of subarrays of size/4, wouldn't it?

An array of size 5 could be divided into 1, 2 or 3 sublists.

What seems to be obvious is not that obvious, if we go into details.

An array of size 5 can be 'divided' into one sublist in just one way, with arguable right to call it 'dividing'. It's just a list of 5 elements (aaaaa). To avoid confusion let's assume the elements inside the list to be ordered characters, not numbers (a,b,c, ...).

Divided into two sublist, they might be (1, 4), (2, 3), (3, 2), (4, 1). (abbbb, aabbb, aaabb, aaaab).

Now let's look back at the claim made before: Shall the 'division' (5) be assumed the same probability as those 4 divisions into 2 sublists? Or shall we mix them together, and assume every partition as evenly probable, (1/5)?

Or can we calculate the solution without knowing the probability of the length of the sublists?

user unknown
From my point of view the problem description is very clear: An array of length n containing sorted numbers. Each number occurs an even number of times, except one, which occurs an odd number of times. The except one implies that n >= 1, there are no further implications! (a), (a,a,a), (a,...(2*i times a)...,a,b) etc are all valid inputs.
Dave
No problem so far, but to decide, which algorithm is best to find said number we have to make assumptions about typical arrays, how often which case occurs, which isn't trivial for an unlimited number of possible problems. Maybe you can show an algorithm which produces all arrays of a length of 5 or all arrays up to a length of 5, or with another restriction, but the way you decide how to produce those arrays will have influence on the probability.
user unknown
+20  A: 

Theorem: Every deterministic algorithm for this problem probes Ω(log2 n) memory locations in the worst case.

Proof (completely rewritten in a more formal style):

Let k > 0 be an odd integer and let n = k2. We describe an adversary that forces (log2 (k + 1))2 = Ω(log2 n) probes.

We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x1 x2 … xk. For each segment xj, there exists an integer bj ∈ [0, k] such that xj consists of bj copies of j - 1 followed by k - bj copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.

Group boundaries
|   |     |   |   |
 0 0 1 1 1 2 2 3 3
|     |     |     |
Segment boundaries

Wherever there is an increase of two, we assume a double boundary by convention.

Group boundaries
|     ||       |   |
 0 0 0  2 2 2 2 3 3

Claim: The location of the jth group boundary (1 ≤ j ≤ k) is uniquely determined by the segment xj.

Proof: It's just after the ((j - 1) k + bj)th memory location, and xj uniquely determines bj. //

We say that the algorithm has observed the jth group boundary in case the results of its probes of xj uniquely determine xj. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.

Group boundaries
|   X   |   |     |
 0 0 ? 1 2 2 3 3 3
|     |     |     |
Segment boundaries

Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".

Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.

Proof: Every other consecutive pair bounds only even groups. //

Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.

Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.

Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the jth group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to xj determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //

Example:

Observed group boundaries; relevant subsequence marked by […]
[             ]   |
 0 0 Y 1 1 Z 2 3 3
|     |     |     |
Segment boundaries

Possibility #1: Y=0, Z=2
Possibility #2: Y=1, Z=2
Possibility #3: Y=1, Z=1

As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.

At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of xj, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the jth boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.

In this way, the adversary forces the algorithm to observe at least log2 (k + 1) group boundaries, and in observing the jth group boundary, the algorithm is forced to make at least log2 (k + 1) probes.


Extensions:

This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.

It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).

Your argument applies to the class of algorithms that sequentially perform binary searches in order to find boundaries. There might be other approaches, so writing "every deterministic algorithm" is a bit of a strech. Your argument is valuable and quite clever and addresses the approach that is probably on everyone's mind. I'd only drop the "every deterministic algorithm" part.
Bolo
No, it's a general lower bound because the structure of the input limits the amount of information that can be gathered by one query.
You're right, this *is* a general argument and seems convincing. (The only doubtful part to me was whether the adversary could make one side odd despite previous probes, but yes, it can.) Great answer!
ShreevatsaR
I am not completely convinced. Why does every algorithm _have_ to find boundaries? Why do we even need to have information about boundaries at all? Even assuming that every algorithm has to determine boundaries somehow, how can we ensure the Omega(logn) probe time for the subsequent boundaries?
Moron
About the omega(log n) lower bound: Say the algorithm looks at 0. Steps to the right, finds a 3, steps to the left and finds a 2, steps more to the left and finds a 1, then a 0 etc. Now we could use the information about the 1, 2 and 3 in subsequent boundary finding, can't we? How have you ensured that this information does not affect the Omega(logn) probe time? Looking for the 0-1 boundary has not told you where the 1-2 boundary is, but it does help you narrow it down. So it is not exactly independent...
Moron
That's true in general, but for the set of inputs used by this lower bound, _a priori_, the sets of locations for two different boundaries may touch but not overlap, preventing that sort of inference.
Not 100% convinced, but I think you're on the right track. This kind of analysis of the problem is worth a lot more than the source code with wrong big-O estimates lots of other ppl keep posting...
R..
I don't see how you can avoid overlaps. What if the algorithm jumped more than sqrt(n) distance for some probe? Anyway all I am saying is that the rightmost 1 and left most 2 is what the next boundary is narrowed down to. How do you ensure that it is Omega(n^c)? With multiple boundaries, you have multiple such ranges. Anyway, sorry for so many questions...
Moron
I think I have the potential function right now...
Same issue with the potential function: Case 1) At best halved the possible locations of a boundary. We could affect possibly more than one boundaries and their possible locations. Case 2) At some point when we pin down a boundary, the previous set of possibilities for that boundary need not be k+1 and so the offsetting increase term need not be log(k+1). Sorry, but I am not convinced, but it is probably my inability to understand what you are doing exactly. You have spent a lot of effort trying to explain to me already, I won't bother you anymore.
Moron
I think your answer is much to pessimistic because of the complexity / cost model that you have. As the answer given by Rotsor points out things change when you go to more realistic models. In particular I think that any useful lower bound should also involve his parameter `K`.
Jens Gustedt
Since this is not a "real" problem, I see no reason why a "realistic" model should have a parameter K. Nevertheless, with N elements and at most K copies of any one element, this argument readily generalizes to yield a lower bound of Ω(log N log K).
(The comments above apply to the previous version.)
+1. You have convinced me :-)
Moron
Great, I wish I could upvote this again! See also Greg Kuperberg's rewrite of this answer below (and the older version of this answer).
ShreevatsaR
@throwawayacct thx for Your effort in rewriting Your argument. Only that made me able to understand it.
Dave
+1  A: 

Take the middle element e. Use binary search to find the first and last occurrence. O(log(n)) If it is odd return e. Otherwise, recurse onto the side that has an odd number of elements [....]eeee[....]

Runtime will be log(n) + log(n/2) + log(n/4).... = O(log(n)^2).

Chad Brewbaker
A: 

Assume indexing start at 0. Binary search for the smallest even i such that x[i] != x[i+1]; your answer is x[i].

edit: due to public demand, here is the code

int f(int *x, int min, int max) {
  int size = max;
  min /= 2;
  max /= 2;
  while (min < max) {
    int i = (min + max)/2;
    if (i==0 || x[2*i-1] == x[2*i])
      min = i+1;
    else
      max = i-1;
  }
  if (2*max == size || x[2*max] != x[2*max+1])
    return x[2*max];
  return x[2*min];
}
David Lehavi
How can one search for the "smallest even" with binary search? Imagine You pick a even probe near the middle of the array. Imagine further that x[i] == x[i+1]. How do You decide whether to proceed left or right of i?
Dave
@Dave: thanks for taking the point off w/o waiting for an answer. You now have i = 2j and x[i] == x[i+1], on the next iteration take j = j / 2, i = 2j.
David Lehavi
@David Lehavi let be x := {a, a, b, b, c}. First probe: i == 2 => j == 1. x[i] == x[i+1] == b (the first b). Second iteration: new_j == 0, new_i == 0. That's where I'm confused. Also, in my example there is no even i for which x[i] != x[i+1]
Dave
@Dave I eddited my answer to answer your question.
David Lehavi
I don't get the second iteration. "first" and "last" are the minimum and maximum index, so 0 and 4 in the beginning - right? first = first / 2 = 0 / 2 = 0, last = last / 2 + 1 = 4 / 2 + 1 = 3 (!= 2 as in Your answer). Further, please note that in a={0,0,1} there is no even i for which a[i] != a[i+1] - even if Your algorithm was correct, Your wording wouldn't be. Can You please write a function "int oddNumber(int *array, int minIndex, int maxIndex)" (the indexes are inclusive)? Thx in advance.
Dave
@Dave Hope this helps
David Lehavi
`int nums [] = {1, 1, 1, 2, 2, 2, 2, 3, 3}; cout << f(nums, 0, 8);` returns `2` instead of `1`. I assumed 0-based indexing and that `max` is the highest valid index.
IVlad
@Dave, lVlad - oops, mixed the +1, -1
David Lehavi
@David - still doesn't work on my example. Now it returns `3`.
IVlad
@David Lehavi Thank You for Your effort. Since Your solution is valid C code, You can copy-paste it into Your favorite IDE, compile it and see it fail like I did (a: {1,2,2}). Dear David, I do not mean to bash You at all! I'm very interested in the solution (so much that I'll start a bounty). And when I heared that someone came up with a solution, I wanted to make sure it was right and therefore intensively looked for errors. Maybe You did an error in reasoning or maybe You are on the right track.. wish You luck winning the bounty. Greetings.
Dave
A: 

You can create a cummulative array and count how much each number occur and then in the cummulative array find the element which is odd. Example:

int a[]=new int[]{2,3,4,2,3,1,4,5,6,5,6,7,1};
int b[]=new int[1000];
for (int i=0;i<b.length;i++) {
    b[i]=0;
}
for (Int i=0;i<a.length;i++) {
    b[a[i]]++;
}
for (int i=0;i<b.length;i++) {
    if ( b[i]!=0) {
        if (b[i] %2==0) {
          system.out.println(i);  break;

    }
}
Firstly, even a single iteration over the input array increases the time to O(N), which is unacceptable. Secondly, the numbers in the array can be arbitrarily big (bigger than 1000 or any other finite bound), so your code is incorrect (will throw ArrayIndexOutOfBoundsException for certain inputs).
Bolo
+6  A: 

This answer is in support of the answer posted by "throwawayacct". He deserves the bounty. I spent some time on this question and I'm totally convinced that his proof is correct that you need Ω(log(n)^2) queries to find the number that occurs an odd number of times. I'm convinced because I ended up recreating the exact same argument after only skimming his solution.

In the solution, an adversary creates an input to make life hard for the algorithm, but also simple for a human analyzer. The input consists of k pages that each have k entries. The total number of entries is n = k^2, and it is important that O(log(k)) = O(log(n)) and Ω(log(k)) = Ω(log(n)). To make the input, the adversary makes a string of length k of the form 00...011...1, with the transition in an arbitrary position. Then each symbol in the string is expanded into a page of length k of the form aa...abb...b, where on the ith page, a=i and b=i+1. The transition on each page is also in an arbitrary position, except that the parity agrees with the symbol that the page was expanded from.

It is important to understand the "adversary method" of analyzing an algorithm's worst case. The adversary answers queries about the algorithm's input, without committing to future answers. The answers have to be consistent, and the game is over when the adversary has been pinned down enough for the algorithm to reach a conclusion.

With that background, here are some observations:

1) If you want to learn the parity of a transition in a page by making queries in that page, you have to learn the exact position of the transition and you need Ω(log(k)) queries. Any collection of queries restricts the transition point to an interval, and any interval of length more than 1 has both parities. The most efficient search for the transition in that page is a binary search.

2) The most subtle and most important point: There are two ways to determine the parity of a transition inside a specific page. You can either make enough queries in that page to find the transition, or you can infer the parity if you find the same parity in both an earlier and a later page. There is no escape from this either-or. Any set of queries restricts the transition point in each page to some interval. The only restriction on parities comes from intervals of length 1. Otherwise the transition points are free to wiggle to have any consistent parities.

3) In the adversary method, there are no lucky strikes. For instance, suppose that your first query in some page is toward one end instead of in the middle. Since the adversary hasn't committed to an answer, he's free to put the transition on the long side.

4) The end result is that you are forced to directly probe the parities in Ω(log(k)) pages, and the work for each of these subproblems is also Ω(log(k)).

5) Things are not much better with random choices than with adversarial choices. The math is more complicated, because now you can get partial statistical information, rather than a strict yes you know a parity or no you don't know it. But it makes little difference. For instance, you can give each page length k^2, so that with high probability, the first log(k) queries in each page tell you almost nothing about the parity in that page. The adversary can make random choices at the beginning and it still works.

Greg Kuperberg
D'oh. Looks like we both had the idea of rewriting this argument :P
So it goes. I didn't mean my answer only be a rewrite, but also as an annotation or summary. Maybe it's still useful for that purpose.
Greg Kuperberg
+1: I suppose it will be useful, too.
Moron
@Greg Kuperberg very useful indeed. i was only able to follow throwawayacct's proof after his rewrite and even then it was very difficult. Your explanations have a great clarity - wish my math prof had a similar skill. thx for Your effort. kudos.
Dave
@Dave: You're very welcome and, seriously, praise like yours helps keep me going.
Greg Kuperberg
A: 

The clue is you're looking for log(n). That's less than n.

Stepping through the entire array, one at a time? That's n. That's not going to work.

We know the first two indexes in the array (0 and 1) should be the same number. Same with 50 and 51, if the odd number in the array is after them.

So find the middle element in the array, compare it to the element right after it. If the change in numbers happens on the wrong index, we know the odd number in the array is before it; otherwise, it's after. With one set of comparisons, we figure out which half of the array the target is in.

Keep going from there.

Dean J
What's if there happens no change in value at the probe?
Dave