views:

1154

answers:

4

I came across this question on an interview questions thread. Here is the question:

Given two integer arrays A [1..n] and B[1..m], find the smallest window in A that contains all elements of B. In other words, find a pair < i , j > such that A[i..j] contains B[1..m].

If A doesn't contain all the elements of B, then i,j can be returned as -1. The integers in A need not be in the same order as they are in B. If there are more than one smallest window (different, but have the same size), then its enough to return one of them.

Example: A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]. The algorithm should return i = 2 (index of 5 in A) and j = 9 (index of 17 in A).

Now I can think of two variations.

Let's suppose that B has duplicates.

  1. This variation doesn't consider the number of times each element occurs in B. It just checks for all the unique elements that occur in B and finds the smallest corresponding window in A that satisfies the above problem. For example, if A[1,2,4,5,7] and B[2,2,5], this variation doesn't bother about there being two 2's in B and just checks A for the unique integers in B namely 2 and 5 and hence returns i=1, j=3.

  2. This variation accounts for duplicates in B. If there are two 2's in B, then it expects to see at least two 2's in A as well. If not, it returns -1,-1.

When you answer, please do let me know which variation you are answering. Pseudocode should do. Please mention space and time complexity if it is tricky to calculate it. Mention if your solution assumes array indices to start at 1 or 0 too.

Thanks in advance.

+1  A: 

Here is the solution I thought of (but it's not very neat).

I am going to illustrate it using the example in the question.

Let A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]

  1. Sort B. (So B = [2,5,8,11,17]). This step takes O(log m).

  2. Allocate an array C of size A. Iterate through elements of A, binary search for it in the sorted B, if it is found enter it's "index in sorted B + 1" in C. If its not found, enter -1. After this step,

A = [1 , 2, 5, 11, 2, 6, 8, 24, 101, 17, 8] (no changes, quoting for ease).

C = [-1, 1, 2, 4 , 1, -1, 3, -1, -1, 5, 3]

Time: (n log m), Space O(n).

  1. Find the smallest window in C that has all the numbers from 1 to m. For finding the window, I can think of two general directions: a. A bit oriented approach where in I set the bit corresponding to each position and finally check by some kind of ANDing. b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window.

Please leave comments regarding the general approach as such, as well as for 3a and 3b.

Skylark
Hey, this is actually pretty neat. The two-pointer "window" thing I described in my solution can be adapted to finding the smallest window in your C. (And it can also be adapted to Variant 2... with some possibly O(m) extra space.)
ShreevatsaR
Hey I didn't got "b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window."Can you explain a bit more on this/
Learner
A: 

If I am understanding this right. It wouldn't be too hard to solve the problem using duplicates.

So given these example arrays

A = {1,4,5,7,8,10}
B = {8,7}

We want to iterate over every element of the A Array with a nested for Loop like so.

for(int i = 0; i<A.length;i++){
    for(int j = i;j<A.length;j++){
    }
}

Those two for loops give us every possible window of A that we could want. Then inside each window of A(an iteration through the inner loop), we will create a dummy array of B containing booleans{}.

For example {false, false}

Then for every Element of your window, you want to check whether that value matches a value in B that doesnt have true in its dummy array position. If there is a match then toggle that position to true. If anytime all of the values in the dummy array are all true then stop and return the indices(Because this means you hit the smallest window). Otherwise temporarily save the indices and the size. Then as you check other windows compare the sizes and swap indices as necessary.

This however can get very nasty if you have huge arrays.

Shortest time is Size of B. Longest time is somewhere around >O(n^2log(size of A)) * size of B? To take a guess

maleki
I think the big-O time for this method is `O(n^2 * m)`, where `n` is the size of A and `m` is the size of B; in other words, it's cubic...
Andrzej Doyle
You are right that it would be very nasty if one did no optimization...
maleki
+4  A: 

Complexity

Time: O((m+n)log m)

Space: O(m)

The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.)

Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in O(m log m) time. In the following, let m denote the number of distinct elements in B. We assume m < n, because otherwise we can just return -1, in constant time.

For each index i in A, we will find the smallest index s[i] such that A[i..s[i]] contains B[1..m], with the right multiplicities. The crucial observation is that s[i] is non-decreasing, and this is what allows us to do it in amortised linear time.

Start with i=j=1. We will keep a tuple (c[1], c[2], ... c[m]) of the number of times each element of B occurs, in the current window A[i..j]. We will also keep a set S of indices (a subset of 1..m) for which the count is "right" (i.e., k for which c[k]=1 in variant 1, or c[k] = <the right number> in variant 2).

So, for i=1, starting with j=1, increment each c[A[j]] (if A[j] was an element of B), check if c[A[j]] is now "right", and add or remove j from S accordingly. Stop when S has size m. You've now found s[1], in at most O(n log m) time. (There are O(n) j's, and each set operation took O(log m) time.)

Now for computing successive s[i]s, do the following. Increment i, decrement c[A[i]], update S accordingly, and, if necessary, increment j until S has size m again. That gives you s[i] for each i. At the end, report the (i,s[i]) for which s[i]-i was smallest.

Note that although it seems that you might be performing up to O(n) steps (incrementing j) for each i, the second pointer j only moves to the right: so the total number of times you can increment j is at most n. (This is amortised analysis.) Each time you increment j, you might perform a set operation that takes O(log m) time, so the total time is O(n log m). The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so O(m) in all.

There is an obvious O(m+n) lower bound, because you need to examine all the elements. So the only question is whether we can prove the log factor is necessary; I believe it is.

ShreevatsaR
It really depends on what operations the array elements support. If it's comparisons only, you're probably right; if they're numeric on the other hand, you can construct an FKS perfect hash in expected linear time with O(1) accesses.
Dave
That looks good. Few comments. 1. I guess it is implicit up there that B is sorted (so that it is easier to check if an element of A is in B). 2. Clarify this - does C[i] contain the frequency of element B[i] in the current window? That is how I understood it, if yes then C[A[j]] looks like wrong representation up there. 3. When one window is found, we can then increment i till it comes across an element that is present in B and then increment j then.
Skylark
1. Yeah, since we sort B initially (to remove duplicates, and/or find counts) we keep a set of the distinct elements in B, and we can look up "is in" in (log m) time. 2. Right, it should be C[index in B of A[j]] (or we can think of C[i] as the count of the number i; C[i] being a dictionary / hash table or whatever). 3. Yes, as long as i is not in B, the window doesn't change; we won't have to increment j.
ShreevatsaR
Cool. That looks like a good algorithm. I like this one better than the algo I've posted. I'll probably write a step-wise algorithm based on yours on post it again as an answer here.
Skylark
A: 

The answer 3a,3b of 1 I didn't able to get it

dsiap