views:

773

answers:

10

I was recently given this interview question and I'm curious what a good solution to it would be.

Say I'm given a 2d array where all the numbers in the array are in increasing order from left to right and top to bottom.

What is the best way to search and determine if a target number is in the array?

Now, my first inclination is to utilize a binary search since my data is sorted. I can determine if a number is in a single row in O(log N) time. However, it is the 2 directions that throw me off.

Another solution I thought may work is to start somewhere in the middle. If the middle value is less than my target, then I can be sure it is in the left square portion of the matrix from the middle. I then move diagnally and check again, reducing the size of the square that the target could potentially be in until I have honed in on the target number.

Does anyone have any good ideas on solving this problem?

Example array:

Sorted left to right, top to bottom.

1 2 4 5 6  
2 3 5 7 8  
4 6 8 9 10  
5 8 9 10 11  
A: 

A. Do a binary search on those lines where the target number might be on.

B. Make it a graph : Look for the number by taking always the smallest unvisited neighbour node and backtracking when a too big number is found

Tuomas Pelkonen
A: 

Binary search would be the best approach, imo. Starting at 1/2 x, 1/2 y will cut it in half. IE a 5x5 square would be something like x == 2 / y == 3 . I rounded one value down and one value up to better zone in on the direction of the targeted value.

For clarity the next iteration would give you something like x == 1 / y == 2 OR x == 3 / y == 5

Woot4Moo
note that you get a triangle, not a square after division
Drakosha
Good catch Drakosha
Woot4Moo
+5  A: 

I would use the divide-and-conquer strategy for this problem, similar to what you suggested, but the details are a bit different.

This will be a recursive search on subranges of the matrix.

At each step, pick an element in the middle of the range. If the value found is what you are seeking, then you're done.

Otherwise, if the value found is less than the value that you are seeking, then you know that it is not in the quadrant above and to the left of your current position. So recursively search the two subranges: everything (exclusively) below the current position, and everything (exclusively) to the right that is at or above the current position.

Otherwise, (the value found is greater than the value that you are seeking) you know that it is not in the quadrant below and to the right of your current position. So recursively search the two subranges: everything (exclusively) to the left of the current position, and everything (exclusively) above the current position that is on the current column or a column to the right.

And ba-da-bing, you found it.

Note that each recursive call only deals with the current subrange only, not (for example) ALL rows above the current position. Just those in the current subrange.

Here's some pseudocode for you:

bool numberSearch(int[][] arr, int value, int minX, int maxX, int minY, int maxY)

if (minX == maxX and minY == maxY and arr[minX,minY] != value)
    return false
if (arr[minX,minY] > value) return false;  // Early exits if the value can't be in 
if (arr[maxX,maxY] < value) return false;  // this subrange at all.
int nextX = (minX + maxX) / 2
int nextY = (minY + maxY) / 2
if (arr[nextX,nextY] == value)
{
    print nextX,nextY
    return true
}
else if (arr[nextX,nextY] < value)
{
    if (numberSearch(arr, value, minX, maxX, nextY + 1, maxY))
        return true
    return numberSearch(arr, value, nextX + 1, maxX, minY, nextY)
}
else
{
    if (numberSearch(arr, value, minX, nextX - 1, minY, maxY))
        return true
    reutrn numberSearch(arr, value, nextX, maxX, minY, nextY)
}
Jeffrey L Whitledge
+1: This is a O(log(N)) strategy, and thus is as good of an order as one is going to get.
Rex Kerr
@Rex Kerr - It looks like O(log(N)), since that's what a normal binary search is, however, note that there are potentially two recursive calls at each level. This means it is much worse than plain logarithmic. I don't believe the worse case is any better than O(M+N) since, potentially, every row or every column must be searched. I would guess that this algorithm could beat the worst case for a lot of values, though. And the best part is that it's paralellizable, since that's where the hardware is headed lately.
Jeffrey L Whitledge
@JLW: It is O(log(N))--but it's actually O(log_(4/3)(N^2)) or something like that. See Svante's answer below. Your answer is actually the same (if you meant recursive in the way I think you did).
Rex Kerr
@Svante - The subarrays do not overlap. In the first option, they have no y-element in common. In the second option, they have no x-element in common.
Jeffrey L Whitledge
Ah, yes, I misread; sorry for that. Your solution is absolutely viable.
Svante
I am not certain of the details, but I think the complexity of this algorithm works out to something pretty good. When one dimention is much larger then the other, then this becomes a binary search on that larger dimention. Only the smaller dimension requires a full scan in the worst case. If we assign m to the larger dimension, then this algorithm should work in time proportional to log(m)+n (i.e., O(n)). The stair-stepping algorithm takes m+n steps (i.e., O(m)). So I believe this recursive search is better than the stair-step to the extent that one dimension is larger than the other.
Jeffrey L Whitledge
I'm not sure if this is logarithmic. I computed the complexity using the approximate recurrence relation T(0) = 1, T(A) = T(A/2) + T(A/4) + 1, where A is the search area, and ended up with T(A) = O(Fib(lg(A))), which is approximately O(A^0.7) and worse than O(n+m) which is O(A^0.5).Maybe I made some stupid mistake, but it looks like the algorithm is wasting a lot of time going down fruitless branches.
Strilanc
@Strilanc - define "fruitless". What branches may be excluded?
Jeffrey L Whitledge
@Strilanc - I thought about it, and I realized that some subranges should be exited immediately, so I added a couple of extra lines to the pseudocode. Does that address your objections?
Jeffrey L Whitledge
I mean that the answer is in the top right, but you can't eliminate the bottom-left without recursing down to the single-element level. You end up doing at least a linear search along the bottom-left to top-right curve defined by the border between elements smaller or larger than your target.
Strilanc
@Strilanc - Yes, that's true, but as Rafał Dowgird points out in another answer, that condition is unavoidable by any algorithm. It is "fruitless", but it's not useless!
Jeffrey L Whitledge
@Jeffrey - Then why are you claiming logarithmic time? [rereads post] Oh. It's the first comment claiming that. Alright then.
Strilanc
A: 

Given a square matrix as follows:

[ a b c ]
[ d e f ]
[ i j k ]

We know that a < c, d < f, i < k. What we don't know is whether d < c or d > c, etc. We have guarantees only in 1-dimension.

Looking at the end elements (c,f,k), we can do a sort of filter: is N < c ? search() : next(). Thus, we have n iterations over the rows, with each row taking either O( log( n ) ) for binary search or O( 1 ) if filtered out.

Let me given an EXAMPLE where N = j,

1) Check row 1. j < c? (no, go next)

2) Check row 2. j < f? (yes, bin search gets nothing)

3) Check row 3. j < k? (yes, bin search finds it)

Try again with N = q,

1) Check row 1. q < c? (no, go next)

2) Check row 2. q < f? (no, go next)

3) Check row 3. q < k? (no, go next)

There is probably a better solution out there but this is easy to explain.. :)

apandit
+9  A: 

Here's a simple approach:

  1. Start at the bottom-left corner.
  2. If the target is less than that value, it must be above us, so move up one.
  3. Otherwise we know that the target can't be in that column, so move right one.
  4. Goto 2.

For an NxM array, this runs in O(N+M). I think it would be difficult to do better. :)

Edit: Lots of good discussion. I was talking about the general case above; clearly, if N or M are small, you could use a binary search approach to do this in something approaching logarithmic time.

Nate Kohl
apply binary search to the diagonal walk and you've got O(logN) or O(logM) whichever is higher.
Anurag
I like this approach. I don't know whether my answer is better than O(N+M) or not, my algorithm-foo isn't that strong, but this one is certainly easy to implement correctly.
Jeffrey L Whitledge
@Anurag - I don't think the complexity works out that well. A binary search will give you a good place to start, but you'll have to walk one dimension or the other all the way, and in the worst case, you could still start in one corner and end in the other.
Jeffrey L Whitledge
It is not very difficult to do better than _O(n+m)_. :)
Svante
@Svante: I'm not convinced yet...see my note on your answer. :)
Nate Kohl
Actually, you are right. It can't get better than _O(n+m)_. Jeffrey has given the compelling reason: It does not matter which value you test, the target could still be in any row and in any column. There are just restrictions on the _combination_ of row and column.
Svante
See my answer for a lower bound on this problem.
Rafał Dowgird
You can do much better than O(m+n) if one dimension is much larger than the other. You can do logorithmic on the higher dimension. This linier scan is not the best possible solution.
Jeffrey L Whitledge
@Svante - What I actually said was that the target could be in any row OR any column--not any row AND any column. It is possible to exclude huge numbers of rows if every column is searched or visa-versa. Just binary search the larger dimension and scan the smaller dimension (which is in essense what my algorithm does).
Jeffrey L Whitledge
@Nate Kohl: It is obvious that we need to start from either `bottom left` or `top right` corners. But, I am not able to convince myself why we cannot start from the other two corners. Can you state the reason? Thanks.
Lazer
@eSKay: Well, one answer is that starting from BL or TR gives you one path to take, but TL or BR doesn't. For example, if you start at the TL, and the target isn't at the TL location, there are three possible paths to take: down, diagonal, and to the right. Looking at all of those paths leads to a less efficient search.
Nate Kohl
A: 

Interesting question. Consider this idea - create one boundary where all the numbers are greater than your target and another where all the numbers are less than your target. If anything is left in between the two, that's your target.

If I'm looking for 3 in your example, I read across the first row until I hit 4, then look for the smallest adjacent number (including diagonals) greater than 3:

1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11

Now I do the same for those numbers less than 3:

1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11

Now I ask, is anything inside the two boundaries? If yes, it must be 3. If no, then there is no 3. Sort of indirect since I don't actually find the number, I just deduce that it must be there. This has the added bonus of counting ALL the 3's.

I tried this on some examples and it seems to work OK.

Grembo
A down vote with no comment? I think this is O(N^1/2) since the worst case performance requires a check of the diagonal. At least show me a counter example where this method doesn't work !
Grembo
+5  A: 

Start in the middle. If this value is bigger than the target, then the target cannot be in the lower right quadrant. If it is smaller, then it cannot be in the upper left quadrant. You can thus do a binary search on the upper left to lower right diagonal, which will take O(log2(max(n,m))). From the endpoint of this binary search, the target cannot be in the lower right nor the upper left "quadrant". You have now at least halved the problem space. Recurse on the remaining "quadrants". Overall time complexity seems to be O(log2(max(n,m)).log2(n.m)) = O(log2(max(n,m)+n.m)), which is asymptotically the same as O(log2(n.m)). If one of the dimensions gets much smaller than the other, this approaches the O(log2n) of an ordinary binary search.

UPDATE: This is wrong. This algorithm is actually something like O(log2(max(n,m)).min(n,m)), as pointed out in the comments. Please redirect your attention to Jeffrey's and Nate's solutions. I am not sure about which is actually better now; I'll try to analyze. I am sorry for the confusion.

Svante
+1 I can't check your math right now, but it looks good to me and the subscripts are pretty. Well done!
Jeffrey L Whitledge
Neat, but I'm not sure about the analysis. Worst case is splitting the square into fourths, which means on iteration j you'll be looking at 2^j squares, each with a diagonal of n/(2^j). Total work: sum from j=0 to log(n) of (2^j * log(n/(2^j)), which converges to O(n).
Nate Kohl
That would mean that the reasoning "you have halved the problem space" is flawed. In fact, I am, in a sense, only doing a binary search in one dimension, while I have a hidden linear search in the other; it seems thus to be _O(log(max(n,m)) min(n,m))_. I'll try to get some experimental data to confirm this.
Svante
A: 

Well, to begin with, let us assume we are using a square.

1 2 3
2 3 4
3 4 5

1. Searching a square

I would use a binary search on the diagonal. The goal is the locate the smaller number that is not strictly lower than the target number.

Say I am looking for 4 for example, then I would end up locating 5 at (2,2).

Then, I am assured that if 4 is in the table, it is at a position either (x,2) or (2,x) with x in [0,2]. Well, that's just 2 binary searches.

The complexity is not daunting: O(log(N)) (3 binary searches on ranges of length N)

2. Searching a rectangle, naive approach

Of course, it gets a bit more complicated when N and M differ (with a rectangle), consider this degenerate case:

1  2  3  4  5  6  7  8
2  3  4  5  6  7  8  9
10 11 12 13 14 15 16 17

And let's say I am looking for 9... The diagonal approach is still good, but the definition of diagonal changes. Here my diagonal is [1, (5 or 6), 17]. Let's say I picked up [1,5,17], then I know that if 9 is in the table it is either in the subpart:

            5  6  7  8
            6  7  8  9
10 11 12 13 14 15 16

This gives us 2 rectangles:

5 6 7 8    10 11 12 13 14 15 16
6 7 8 9

So we can recurse! probably beginning by the one with less elements (though in this case it kills us).

I should point that if one of the dimensions is less than 3, we cannot apply the diagonal methods and must use a binary search. Here it would mean:

  • Apply binary search on 10 11 12 13 14 15 16, not found
  • Apply binary search on 5 6 7 8, not found
  • Apply binary search on 6 7 8 9, not found

It's tricky because to get good performance you might want to differentiate between several cases, depending on the general shape....

3. Searching a rectangle, brutal approach

It would be much easier if we dealt with a square... so let's just square things up.

1  2  3  4  5  6  7  8
2  3  4  5  6  7  8  9
10 11 12 13 14 15 16 17
17 .  .  .  .  .  .  17
.                    .
.                    .
.                    .
17 .  .  .  .  .  .  17

We now have a square.

Of course, we will probably NOT actually create those rows, we could simply emulate them.

def get(x,y):
  if x < N and y < M: return table[x][y]
  else: return table[N-1][M-1]            # the max

so it behaves like a square without occupying more memory (at the cost of speed, probably, depending on cache... oh well :p)

Matthieu M.
+2  A: 

This is a short proof of the lower bound on the problem.

You cannot do it better than linear time (in terms of array dimensions, not the number of elements). In the array below, each of the elements marked as * can be either 5 or 6 (independently of other ones). So if your target value is 6 (or 5) the algorithm needs to examine all of them.

1 2 3 4 *
2 3 4 * 7
3 4 * 7 8
4 * 7 8 9
* 7 8 9 10

Of course this expands to bigger arrays as well. This means that this answer is optimal.

Update: As pointed out by Jeffrey L Whitledge, it is only optimal as the asymptotic lower bound on running time vs input data size (treated as a single variable). Running time treated as two-variable function on both array dimensions can be improved.

Rafał Dowgird
You have not demonstrated that that answer is optimal. Consider, for example, an array that's ten-across and one-million down in which the fifth row contains values all higher than the target value. In that case the proposed algorithm will do a linier search up 999,995 values before getting close to the target. A bifurcating algorithm like mine will only search 18 values before nearing the target. And it performs (asymtotically) no worse than the proposed algorithm in all other cases.
Jeffrey L Whitledge
@Jeffrey: It is a *lower bound* on the problem for the pessimistic case. You can optimize for good inputs, but there exist inputs where you cannot do better than linear.
Rafał Dowgird
@Rafał Dowgird - Yes, there do exist inputs where you cannot do better than linear. In which case my algorithm performs that linear search. But there are other inputs where you can do *way* better than linear. Thus the proposed solution is not optimal, since it always does a linear search.
Jeffrey L Whitledge
This shows the algorithm must take BigOmega(min(n,m)) time, not BigOmega(n+m). That's why you can do much better when one dimension is significantly smaller. For example, if you know there will only be 1 row, you can solve the problem in logarithmic time. I think an optimal algorithm will take time O(min(n+m, n lg m, m lg n)).
Strilanc
Updated the answer accordingly.
Rafał Dowgird
+1  A: 

EDIT:

I misunderstood the question. As the comments point out this only works in the more restricted case.

In a language like C that stores data in row-major order, simply treat it as a 1D array of size n * m and use a binary search.

Hugh Brackett
Yes, why make it more complex than it has to be.
erikkallen
Array is not sorted, thus no bin search can be applied to it
Miollnyr
This will only work if the last element of each row is higher than the first element on the next row, which is a much more restrictive requirement than the problem proposes.
Jeffrey L Whitledge
Thanks, I've edited my answer. Didn't read carefully enough, particularly the example array.
Hugh Brackett