views:

150

answers:

4

This question arose from the discussion in the comments of this answer.

First, let's say it's quite difficult to define what out-of-order is. Taking the example Pavel Shved gave, in the list [1,5,10,2,3,4,5,6,7,8,9,10,11] we can "clearly" see that 5 and 10 (indices 1 and 2) are out of order. But a naïve algorithm that simply checks some kind of sorted list invariant would not point those out.

  • checking a[i-1]<=a[i] for all 0<i<=N would yield the element at index 3 (which is 2);

  • checking a[j]<=a[i] for all 0<=i<=N and 0<=j<=i would yield all elements in indices 3 to 12;

My question is: can you think of an algorithm to solve this problem that yields the "correct answer" (i.e. indices 1 and 2)? If so, under what time and memory complexity would it run?

+8  A: 

Probably the best approach to this would be to first find the longest increasing subsequence and then consider all elements not part of that sequence to be out of order. The algorithm provided on the linked page runs in O(n log n) time and uses O(n) space (in addition to that of the list itself).

Such an approach would definitely yield the correct answer for your example case, since the longest increasing subsequence would be the 1 through 11 sequence not including the extra 5 and 10.

Amber
That works in my first example but look at this one: [1, 10, 2, 3, 4, 6, 5]. 10 and 6 are out of order, but 1 and 5 not... The more I think of it, the more my question seems a moot point, because I can't quite define what out of order is, as Botz3000 points out..
Martinho Fernandes
Oh, wait! I didn't check the link because I thought I knew what longest increasing subsequence meant! Then I read this "This subsequence is not necessarily contiguous". Now, I think that may be the solution.
Martinho Fernandes
Note that with [1, 2, 3, 4, 6, 5] what "out of order" means is ambiguous - the 6 could be out of order, OR the 5 might be considered out of order, since moving either one a single place to the left or right would put that portion of the list "into order". In other words, you could say "the 5 is too far right" or "the 6 is too far left" and both would be essentially equivalent in terms of "degree of disorder".
Amber
@Dav, you are correct. As pointed out in the wikipedia link provided, the longest increasing subsequence (let's call it LIS for short) is not unique. But the for original goal of this algorithm (see the link in the question) pointing any one of them is enough.
Martinho Fernandes
it's worth noting that this can be done in O(N log K) where K is the number of "out-of-order" elements
yairchu
I'm taking this as the accepted answer because it seems to define (and solve) my blurred and "humanized" definition of out-of-order as best as possible.
Martinho Fernandes
+1  A: 

How should the algorithm know which elements you consider out of order?

If the rule is "list[i+1] should always be list[i] + 1", then it would be easy, of course, just memorizing that after 1, the next element should be 2, select those in between and so on. But you need a precise rule for the algorithm to determine which elements are to be considered "out-of-order".

Botz3000
A: 

As Dav said, longest increasing subsequence is probably the best you can do.

this is off the top of my head, so it may (read probably isn't :P) correct: The obvious lower bound for this problem is O(n) since you at least have to read each number once. But suppose there was an algorithm that ran in O(n) time. Then you can simply insert the out of order elements in order in linear time, but the best comparison sort algorithm have a lower bound of O(nLogn) so it's a contradiction. (otherwise, there are non-comparison sort methods like bucket or radix sort that either use a lot of memory or exploit the size of the numbers being sorted...)

Charles Ma
Correct. The factor of N is required to process each element in the list (a necessity), and the factor of log N is required to determine where that element belongs. Thus any algorithm that guarantees a sorted result will have a lower bound of O(N log N).
Amber
A: 

naïve algorithms considered harmful

It's easy to solve the problem for your test case example with backtracking, but it would still be a naïve algorithm and it would fall over on, say, a sorted array with a small number at the very end. I imagine the best you can do is sort the array efficiently and then compare (the increasing subsequence identifier algorithms don't seem any faster than that) but I presume that defeats your unstated purpose, which I bet is to avoid sorting.

I'm going to also bet that you have exactly three choices:

  1. Invoke a library sort routine and call it a day.
  2. Implement a simple sort that works well with almost-sorted input. (Don't use bubble sort! It has an easily hit worst case .. imagine a small element at the end of the list. Bingo, n**2. Cocktail sort solves this problem, but it's dangerous too, just less dangerous.)
  3. Don't let the list get unsorted in the first place. I know, these are a bit obvious.
DigitalRoss
Thanks for all the warnings. But I wasn't even thinking about using any of this in any kind of code. I was asking a purely theoretical question that arose in the discussion I linked to.
Martinho Fernandes
And btw, sorting the array and comparing will not solve my problem. Just take my example: [1,5,10,2,3,4,5,6,7,8,9,10,11] when sorted becomes [1,2,3,4,5,5,6,7,8,9,10,10,11]. You have only changed one difficult problem, that of finding elements out of order, into another difficult problem, that of diffing two lists.
Martinho Fernandes