ansaurus

Question

Algorithm to determine indices i..j of array A containing all the elements of another array B

Answer 1

+1 A:

Here is the solution I thought of (but it's not very neat).

I am going to illustrate it using the example in the question.

Let A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]

Sort B. (So B = [2,5,8,11,17]). This step takes O(log m).
Allocate an array C of size A. Iterate through elements of A, binary search for it in the sorted B, if it is found enter it's "index in sorted B + 1" in C. If its not found, enter -1. After this step,

A = [1 , 2, 5, 11, 2, 6, 8, 24, 101, 17, 8] (no changes, quoting for ease).

C = [-1, 1, 2, 4 , 1, -1, 3, -1, -1, 5, 3]

Time: (n log m), Space O(n).

Find the smallest window in C that has all the numbers from 1 to m. For finding the window, I can think of two general directions: a. A bit oriented approach where in I set the bit corresponding to each position and finally check by some kind of ANDing. b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window.

Please leave comments regarding the general approach as such, as well as for 3a and 3b.

Skylark 2009-05-29 13:45:15

Hey, this is actually pretty neat. The two-pointer "window" thing I described in my solution can be adapted to finding the smallest window in your C. (And it can also be adapted to Variant 2... with some possibly O(m) extra space.)

ShreevatsaR 2009-05-29 14:38:12

Hey I didn't got "b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window."Can you explain a bit more on this/

Learner 2009-07-07 04:39:49

Answer 2

A:

If I am understanding this right. It wouldn't be too hard to solve the problem using duplicates.

So given these example arrays

A = {1,4,5,7,8,10}
B = {8,7}

We want to iterate over every element of the A Array with a nested for Loop like so.

for(int i = 0; i<A.length;i++){
    for(int j = i;j<A.length;j++){
    }
}

Those two for loops give us every possible window of A that we could want. Then inside each window of A(an iteration through the inner loop), we will create a dummy array of B containing booleans{}.

For example {false, false}

Then for every Element of your window, you want to check whether that value matches a value in B that doesnt have true in its dummy array position. If there is a match then toggle that position to true. If anytime all of the values in the dummy array are all true then stop and return the indices(Because this means you hit the smallest window). Otherwise temporarily save the indices and the size. Then as you check other windows compare the sizes and swap indices as necessary.

This however can get very nasty if you have huge arrays.

Shortest time is Size of B. Longest time is somewhere around >O(n^2log(size of A)) * size of B? To take a guess

maleki 2009-05-29 14:04:17

I think the big-O time for this method is `O(n^2 * m)`, where `n` is the size of A and `m` is the size of B; in other words, it's cubic...

Andrzej Doyle 2009-05-29 14:14:53

You are right that it would be very nasty if one did no optimization...

maleki 2009-05-29 14:43:11

Answer 3

+4 A:

Complexity

Time: O((m+n)log m)

Space: O(m)

The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.)

Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in O(m log m) time. In the following, let m denote the number of distinct elements in B. We assume m < n, because otherwise we can just return -1, in constant time.

For each index i in A, we will find the smallest index s[i] such that A[i..s[i]] contains B[1..m], with the right multiplicities. The crucial observation is that s[i] is non-decreasing, and this is what allows us to do it in amortised linear time.

Start with i=j=1. We will keep a tuple (c[1], c[2], ... c[m]) of the number of times each element of B occurs, in the current window A[i..j]. We will also keep a set S of indices (a subset of 1..m) for which the count is "right" (i.e., k for which c[k]=1 in variant 1, or c[k] = <the right number> in variant 2).

So, for i=1, starting with j=1, increment each c[A[j]] (if A[j] was an element of B), check if c[A[j]] is now "right", and add or remove j from S accordingly. Stop when S has size m. You've now found s[1], in at most O(n log m) time. (There are O(n) j's, and each set operation took O(log m) time.)

Now for computing successive s[i]s, do the following. Increment i, decrement c[A[i]], update S accordingly, and, if necessary, increment j until S has size m again. That gives you s[i] for each i. At the end, report the (i,s[i]) for which s[i]-i was smallest.

Note that although it seems that you might be performing up to O(n) steps (incrementing j) for each i, the second pointer j only moves to the right: so the total number of times you can increment j is at most n. (This is amortised analysis.) Each time you increment j, you might perform a set operation that takes O(log m) time, so the total time is O(n log m). The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so O(m) in all.

There is an obvious O(m+n) lower bound, because you need to examine all the elements. So the only question is whether we can prove the log factor is necessary; I believe it is.

ShreevatsaR 2009-05-29 14:30:49

It really depends on what operations the array elements support. If it's comparisons only, you're probably right; if they're numeric on the other hand, you can construct an FKS perfect hash in expected linear time with O(1) accesses.

Dave 2009-05-30 03:03:33

That looks good. Few comments. 1. I guess it is implicit up there that B is sorted (so that it is easier to check if an element of A is in B). 2. Clarify this - does C[i] contain the frequency of element B[i] in the current window? That is how I understood it, if yes then C[A[j]] looks like wrong representation up there. 3. When one window is found, we can then increment i till it comes across an element that is present in B and then increment j then.

Skylark 2009-05-30 08:50:23

1. Yeah, since we sort B initially (to remove duplicates, and/or find counts) we keep a set of the distinct elements in B, and we can look up "is in" in (log m) time. 2. Right, it should be C[index in B of A[j]] (or we can think of C[i] as the count of the number i; C[i] being a dictionary / hash table or whatever). 3. Yes, as long as i is not in B, the window doesn't change; we won't have to increment j.

ShreevatsaR 2009-05-30 11:12:27

Cool. That looks like a good algorithm. I like this one better than the algo I've posted. I'll probably write a step-wise algorithm based on yours on post it again as an answer here.

Skylark 2009-05-30 12:10:19

Answer 4

A:

The answer 3a,3b of 1 I didn't able to get it

dsiap 2010-04-27 05:32:41

ansaurus

tags:

views:

answers:

Algorithm to determine indices i..j of array A containing all the elements of another array B

Complexity

Time: O((m+n)log m)

Space: O(m)

related questions