views:

943

answers:

8

I found the following problem on the internet, and would like to know how I would go about solving it:

You are given an array ' containing 0s and 1s. Find O(n) time and O(1) space algorithm to find the maximum sub sequence which has equal number of 1s and 0s.

Examples:

  1. 10101010 - The longest sub sequence that satisfies the problem is the input itself
  2. 1101000 - The longest sub sequence that satisfies the problem is 110100
A: 

brute force: start with maximum length of the array to count the o's and l's. if o eqals l, you are finished. else reduce search length by 1 and do the algorithm for all subsequences of the reduced length (that is maximium length minus reduced length) and so on. stop when the subtraction is 0.

Peter Miehle
sounds like quadratic time complexity
Peter G.
yes, but maybe it is a good starting point for him to reach a better solution?
Peter Miehle
+2  A: 

New solution: Suppose we have for n-bit input bit-array 2*n-size array to keep position of bit. So, the size of array element must have enough size to keep maximum position number. For 256 input bit array, it's needed 256x2 array of bytes (byte is enough to keep 255 - the maximum position).

Moving from the first position of bit-array we put the position into array starting from the middle of array (index is n) using a rule:

1. Increment the position if we passed "1" bit and decrement when passed "0" bit

2. When meet already initialized array element - don't change it and remember the difference between positions (current minus taken from array element) - this is a size of local maximum sequence.

3. Every time we meet local maximum compare it with the global maximum and update if the latter is less.

For example: bit sequence is 0,0,0,1,0,1

   initial array index is n
   set arr[n] = 0 (position)
     bit 0 -> index--
   set arr[n-1] = 1 
     bit 0 -> index--
   set arr[n-2] = 2
     bit 0 -> index--
   set arr[n-3] = 3
     bit 1 -> index++
   arr[n-2] already contains 2 -> thus, local max seq is [3,2] becomes abs. maximum
      will not overwrite arr[n-2]
     bit 0 -> index--
   arr[n-3] already contains 3 -> thus, local max seq is [4,3] is not abs. maximum
     bit 1 -> index++
   arr[n-2] already contains 2 -> thus, local max seq is [5,2] is abs. max

Thus, we passing through the whole bit array only once. Does this solves the task?

input:
    n - number of bits
    a[n] - input bit-array

track_pos[2*n] = {0,};
ind = n;
/* start from position 1 since zero has
  meaning track_pos[x] is not initialized */
for (i = 1; i < n+1; i++) {
    if (track_pos[ind]) {
        seq_size = i - track_pos[ind];
        if (glob_seq_size < seq_size) {
            /* store as interm. result */
            glob_seq_size = seq_size;
            glob_pos_from = track_pos[ind];
            glob_pos_to   = i;
        }
    } else {
        track_pos[ind] = i;
    }

    if (a[i-1])
        ind++;
    else
        ind--;
}

output:
    glob_seq_size - length of maximum sequence
    glob_pos_from - start position of max sequence
    glob_pos_to   - end position of max sequence
Pmod
This finds the longest _prefix_ of the array that has equals 1s and 0s. Perhaps this special case is the sought answer. See my answer for an approach that finds the maximum subsequence wherever it starts from.
Dimitris Andreou
Actually found another, haven't yet compared with yours.
Pmod
Well, this extra array you are using, from which you start in the middle, and go "right" when you see a 1, and go "left" when you see a 0, is basically a stack; what I mention in my answer a dual stack. In my case, it either contains 1s or 0s, as in your case is either on the right (1s) or on the left (0s). You say "when you meet already initialized..."-this only happens when you are moving towards the middle - a pop()/match! The numbers you put in the array resemble what I call leftSeqIndex is my pseudocode. So you describe something similar, but you're missing some other parts.
Dimitris Andreou
Actually, it looks simpler. Which parts am I missing?
Pmod
I meant it lacked details (when I commented, the code was missing). Slight nitpicking though: your stack uses (needlessly) double the space than mine. (This is easily fixable of course).
Dimitris Andreou
There was no requirements on space, there was a requirement on computation complexity. And your stack push/pop operation takes much more than operations with array.
Pmod
Few points: a) there was a space requirement (albeit a mysterious one). b) Regardless (a), when you need a stack that you know never stores more than N elements, allocating an array of size 2N is simply wasteful (both in time and memory). c) (b) is trivially fixable (e.g. replacing the extra N array cells with just a single bit), d) Stack is an ADT, there are various concrete implementations of it, one being an array just like yours, so "push/pop operation takes much more...etc" makes no sense.
Dimitris Andreou
a) Allocating of an array hardly depends on its size in time.b) I don't believe that being implemented in C (as required) your solution will be simpler and faster. Probably we can finalize implementations and compare ;)
Pmod
a) Yes, but unless you want an array of garbage (whereas your code up there expects an array of zeros, apparently), the "magic" of initialization needs to take place :) b) feel free to finalize a *correct* implementation first, c) I take it that you agree on the rest of the points you didn't address, I understand people can be too lazy to type "I agree" sometimes :)
Dimitris Andreou
I was thinking about this last night, and came to the following conclusion. If you draw a graph of the bits where you increment for a 1 and decrement for a -1, then what we're looking for is the largest line that we can draw between two points at the same 'height'. I think that you're finding that line.
Yellowfog
@Yellowfog: You are absolutely right! I had the same graphic representation in mind while was constructing the algorithm.
Pmod
Did I actually write 'decrement for a -1' in the above? Or has my comment been oddly edited? Either way I obviously meant 'decrement for a 0'. Anyway, good work Pmod
Yellowfog
Either way - I understood that we have the same idea about finding the longest line ;) It's great to find link-minded people.
Pmod
+10  A: 

Update.

I have to completely rephrase my answer. (If you had upvoted the earlier version, well, you were tricked!)

Lets sum up the easy case again, to get it out of the way:

Find the longest prefix of the bit-string containing an equal number of 1s and 0s of the array.

This is trivial: A simple counter is needed, counting how many more 1s we have than 0s, and iterating the bitstring while maintaining this. The position where this counter becomes zero for the last time is the end of the longest sought prefix. O(N) time, O(1) space. (I'm completely convinced by now that this is what the original problem asked for. )

Now lets switch to the more difficult version of the problem: we no longer require subsequences to be prefixes - they can start anywhere.

After some back and forth thought, I thought there might be no linear algorithm for this. For example, consider the prefix "111111111111111111...". Every single 1 of those may be the start of the longest subsequence, there is no candidate subsequence start position that dominates (i.e. always gives better solutions than) any other position, so we can't throw away any of them (O(N) space) and at any step, we must be able to select the best start (which has an equal number of 1s and 0s to the current position) out of linearly many candidates, in O(1) time. It turns out this is doable, and easily doable too, since we can select the candidate based on the running sum of 1s (+1) and 0s (-1), this has at most size N, and we can store the first position we reach each sum in 2N cells - see pmod's answer below (yellowfog's comments and geometric insight too).

Failing to spot this trick, I had replaced a fast but wrong with a slow but sure algorithm, (since correct algorithms are preferable to wrong ones!):

  • Build an array A with the accumulated number of 1s from the start to that position, e.g. if the bitstring is "001001001", then the array would be [0, 0, 1, 1, 1, 2, 2, 2, 3]. Using this, we can test in O(1) whether the subsequence (i,j), inclusive, is valid: isValid(i, j) = (j - i + 1 == 2 * (A[j] - A[i - 1]), i.e. it is valid if its length is double the amount of 1s in it. For example, the subsequence (3,6) is valid because 6 - 3 + 1 == 2 * A[6] - A[2] = 4.
  • Plain old double loop:

    maxSubsLength = 0 for i = 1 to N - 1 for j = i + 1 to N if isValid(i, j) ... #maintain maxSubsLength end end

This can be sped up a bit using some branch-and-bound by skipping i/j sequences which are shorter than the current maxSubsLength, but asymptotically this is still O(n^2). Slow, but with a big plus on its side: correct!

Dimitris Andreou
I must admit I quite fed off by the "homework" comments as well :) Your stack based solution is quite interesting.
Matthieu M.
+8  A: 

Strictly speaking, the answer is that no such algorithm exists because the language of strings consisting of an equal number of zeros and ones is not regular.

Of course everyone ignores that fact that storing an integer of magnitude n is O(log n) in space and treats it as O(1) in space. :-) Pretty much all big-O's, including time ones, are full of (or rather empty of) missing log n factors, or equivalently, they assume n is bounded by the size of a machine word, which means you're really looking at a finite problem and everything is O(1).

R..
Good, square point regarding the impossibility of that. Also note the resemblance of my answer to a pushdown automaton (e.g. an automaton manipulating a stack, peeking it, pushing and popping symbols).
Dimitris Andreou
Could you elaborate on the "no such algorithm exists" postulate? It sounds like you can prove it. If so, I would be interested in the proof.
Adam Shiemke
Wikipedia's page on the pumping lemma (see http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages) proves that 0^p1^p is a string that does not properly pump for the language L=0^p1^p. It can also be used to show that the language the OP describes is not regular either. As for why this matters, remember that a regular language can be represented by an equivalent finite automaton; finite automatons have no memory beyond the bare minimum needed for maintaining state. The problem is O(1) in space, but parsing a non-regular language is necessarily not O(n) in space.
Platinum Azure
Hi Adam. As Platinum Azure said, the Pumping Lemma can be used to prove that the language is not regular. Further, it's a theorem that any language which can be recognized in constant space is regular, so if you assume there's a constant space algorithm, you've reached a contradiction.
R..
@R.. @Platinum Azure Thanks.
Adam Shiemke
A: 

As was pointed out by user "R..", there is no solution, strictly speaking, unless you ignore the "log n" space complexity. In the following, I will consider that the array length fits in a machine register (e.g. a 64-bit word) and that a machine register has size O(1).

The important point to notice is that if there are more 1's than 0's, then the maximum subsequence that you are looking for necessarily includes all the 0's, and that many 1's. So here the algorithm:

Notations: the array has length n, indices are counted from 0 to n-1.

  1. First pass: count the number of 1's (c1) and 0's (c0). If c1 = c0 then your maximal subsequence is the entire array (end of algorithm). Otherwise, let d be the digit which appears the less often (d = 0 if c0 < c1, otherwise d = 1).
  2. Compute m = min(c0, c1) * 2. This is the size of the subsequence you are looking for.
  3. Second pass: scan the array to find the index j of the first occurrence of d.
  4. Compute k = max(j, n - m). The subsequence starts at index k and has length m.

Note that there could be several solutions (several subsequences of maximal length which match the criterion).

In plain words: assuming that there are more 1's than 0's, then I consider the smallest subsequence which contains all the 0's. By definition, that subsequence is surrounded by bunches of 1's. So I just grab enough 1's from the sides.

Edit: as was pointed out, this does not work... The "important point" is actually wrong.

Thomas Pornin
This won't work for 10001.
Nabb
Step 1 is fine. After that, find the first occurrence of d. Process the bit string from there, keeping a counter which is initialized to 1 and incremented whenever you encounter d, and decremented when you encounter 1-d. Whenever the counter hits 0, you have a subsequence with an equal number of 0s and 1s. Store the current position and continue processing. When you hit the end of the bit string, the last position stored will give a maximal subsequence.
R..
@R..: will that work for 110?
Nate Kohl
The step 2 can't produce the correct result, unless you interpret that the subsequence needs not be contiguous. Nabb's counterexample is the simplest that invalidates this logic - this decides that the subsequence has length 4, but you need to throw away a zero, breaking the chain.
Dimitris Andreou
@Nate, you're right, missed that case. If you hit end of the bit string without getting an equal number of 0s and 1s, you need to back up from the beginning by the number of positions you're off by.
R..
A: 

Try something like this:

/* bit(n) is a macro that returns the nth bit, 0 or 1. len is number of bits */
int c[2] = {0,0};
int d, i, a, b, p;
for(i=0; i<len; i++) c[bit(i)]++;
d = c[1] < c[0];
if (c[d] == 0) return; /* all bits identical; fail */
for(i=0; bit(i)!=d; i++);
a = b = i;
for(p=0; i<len; i++) {
  p += 2*bit(i)-1;
  if (!p) b = i;
}
if (a == b) { /* account for case where we need bits before the first d */
  b = len - 1;
  a -= abs(p);
}
printf("maximal subsequence consists of bits %d through %d\n", a, b);

Completely untested but modulo stupid mistakes it should work. Based on my reply to Thomas's answer which failed in certain cases.

R..
A: 

In this thread ( http://discuss.techinterview.org/default.asp?interview.11.792102.31 ), poster A.F. has given an algorithm that runs in O(n) time and uses O(sqrt(n log n)) bits.

A: 

New Solution: Space complexity of O(1) and time complexity O(n^2)

        int iStart = 0, iEnd = 0;
        int[] arrInput = { 1, 0, 1, 1, 1,0,0,1,0,1,0,0 };

        for (int i = 0; i < arrInput.Length; i++)
        {
            int iCurrEndIndex = i;
            int iSum = 0;
            for (int j = i; j < arrInput.Length; j++)
            {                    
                iSum = (arrInput[j] == 1) ? iSum+1 : iSum-1;
                if (iSum == 0)
                {
                    iCurrEndIndex = j;
                }

            }
            if ((iEnd - iStart) < (iCurrEndIndex - i))
            {
                iEnd = iCurrEndIndex;
                iStart = i;
            }
        }
kirant400
The requirement was: O(n) time
Pmod