ansaurus

Question

Finding sorted sub-sequences in a permutation

Answer 1

A:

I don't think you need an algorithm. Here is what you need. Summation (i =0 to i = n-1 ) (n - i) * (i + 1)!

bhups 2009-12-01 07:20:23

Do you imply that for a given n, the number of valid blocks is constant ???

mjv 2009-12-01 07:29:26

I guess I read the problem incorrectly. What I was thinking is all the possible valid blocks from [1, n]. thanks for pointing that out.

bhups 2009-12-01 07:44:40

Answer 2

+1 A:

This is not quite a full solution, but a starting point for more thought.

The trick appears to lie in the fact that the set is always 1,2,...,n from which it is clear that the entire set is always the first obviously valid block. If you start from that full set and work your way down, it seems to be a little easier to grasp.

The set with the most valid blocks is M = [1 2 3 4 5 . . . n] because every subset [i..j] is guaranteed to be a valid block. M is a good test case because you can easily determine the actual number of valid blocks: ((n - 1) / 2) * n.

NickLarsen 2009-12-01 22:08:46

Answer 3

+1 A:

Imagine you had a function which, given a list of n integers could tell you if they were a valid block or not.

Imagine you modified this function so it instead returned a count of how many sub blocks were valid, so given [1 3 2 4] would find [1 3 2 4], [1 3 2], [3 2 4], [3 2].

To make that modification, you'd just loop through all the possible sub blocks, passing them to the original function until you had just 2 numbers:

1,3,2,4 is valid
1,3,2 is valid
1,3 is NOT valid
3,2,4 is valid
3,2 is valid
There were 4 valid sub blocks

The first function then:

def isValid(li):
    return (max(li)-min(li) == len(li)-1)

That is, assuming all values are different, the largest value minus the smallest should yield the length of the array minus 1. For [1 3 2 4], largest = 4, smallest = 1, max-min=3, length-1 = 3, job done.

The main checking function:

def countValidSubs(li):
    count = 0
    length = len(li)
    for start in range(0,length-2):
        for newlen in range(length-start,1,-1):
            newli = li[start:start+newlen]
            if isValid(newli):
                print(','.join(`i` for i in newli)+" is valid")
                count += 1
            else:
                print(','.join(`i` for i in newli)+" is NOT valid")
    return count

Then just call like:

countValidSubs([1, 3, 2, 4, 5, 7, 9, 8, 6])

(Answer there is 14, by the way)

The only problem with this as a homework answer is that it is O(n²/2)

Phil H 2009-12-07 16:44:53

+1 even though it's not n log n

mdma 2010-07-27 01:14:47

`isValid` should have an extra line before the return that reads `if len(li) <= 1: return false` for it to be correct.

Allen 2010-08-01 21:29:43

Answer 4

+14 A:

Here's a worst-case O(n log n) divide and conquer algorithm. Given a non-empty sublist of a permutation, divide it into a left half, a middle element, and a right half. Recursively compute the number of blocks contained in the left half and the number of blocks contained in the right half. Now in O(n) time, compute the number of blocks that include the middle element as follows.

Observe that the intersection of two valid blocks is either empty or a valid block and that the whole permutation is a valid block. Together, these facts imply the existence of closures: unique minimal valid blocks that contain a specified (non-contiguous) subsequence. Define a left extension to be a closure of the middle element and an element not in the right half. Define a right extension to be a closure of the middle element and an element not in the left half. The left extensions (respectively, the right extensions) are totally ordered with respect to the sublist relation. They can be computed in order in linear time by means of a simple work-list algorithm.

Observe now that the union of two overlapping valid blocks is itself a valid block. I claim that every valid block containing the middle element can be written as the union of the left extension generated by the leftmost element with the right extension generated by the rightmost element. To count unions of this form, iterate through the left extensions in increasing order. Maintain pointers to the least right extension whose rightmost element is not left of the left extension's rightmost, and to the least whose leftmost element is left of the left extension's leftmost. Because of monotonicity, these pointers can only move to larger extensions, so the total work is linear. They bound above and below the eligible partners for the current left extension.

C++ implementation:

#include <algorithm>
#include <cstdio>
#include <cstdlib>
#include <queue>
#include <stdexcept>
#include <vector>

namespace {
typedef std::vector<int> IntVector;

struct Interval {
  int left;
  int right;
};

Interval MakeInterval(int left, int right) {
  Interval i = {left, right};
  return i;
}

typedef std::vector<Interval> IntervalVector;

enum Direction {
  kLeft,
  kRight,
};

// Finds the valid intervals obtained by starting with [pi[mid],
// pi[mid]] and repeatedly extending in direction dir
//
// O(right_boundary - left_boundary)
void FindExtensions(const IntVector& pi, const IntVector& pi_inv,
                    int left_boundary, int right_boundary,
                    Direction dir, IntervalVector* extensions) {
  int mid = left_boundary + (right_boundary - left_boundary) / 2;
  int left = mid;
  int right = mid;
  int lower = pi[mid];
  int upper = pi[mid];
  std::queue<int> worklist;
  while (true) {
    if (worklist.empty()) {
      extensions->push_back(MakeInterval(left, right));
      if (dir == kLeft) {
        if (left == left_boundary) break;
        --left;
        worklist.push(left);
      } else {
        if (right == right_boundary) break;
        ++right;
        worklist.push(right);
      }
    } else {
      int i = worklist.front();
      worklist.pop();
      if (i < left) {
        if (i < left_boundary) break;
        for (int j = left - 1; j >= i; --j) worklist.push(j);
        left = i;
      } else if (right < i) {
        if (right_boundary < i) break;
        for (int j = right + 1; j <= i; ++j) worklist.push(j);
        right = i;
      }
      int x = pi[i];
      if (x < lower) {
        for (int y = lower - 1; y > x; --y) worklist.push(pi_inv[y]);
        lower = x;
      } else if (upper < x) {
        for (int y = upper + 1; y < x; ++y) worklist.push(pi_inv[y]);
        upper = x;
      }
    }
  }
}

int CountValidRecursive(const IntVector& pi, const IntVector& pi_inv,
                        int left, int right) {
  if (right < left) return 0;
  int mid = left + (right - left) / 2;
  int count = CountValidRecursive(pi, pi_inv, left, mid - 1) +
      CountValidRecursive(pi, pi_inv, mid + 1, right);
  IntervalVector left_exts;
  FindExtensions(pi, pi_inv, left, right, kLeft, &left_exts);
  IntervalVector right_exts;
  FindExtensions(pi, pi_inv, left, right, kRight, &right_exts);
  typedef IntervalVector::const_iterator IVci;
  IVci first_good = right_exts.begin();
  IVci first_bad = right_exts.begin();
  for (IVci ext = left_exts.begin(); ext != left_exts.end(); ++ext) {
    while (first_good != right_exts.end() && first_good->right < ext->right) {
      ++first_good;
    }
    if (first_good == right_exts.end()) break;
    while (first_bad != right_exts.end() && ext->left <= first_bad->left) {
      ++first_bad;
    }
    count += std::distance(first_good, first_bad);
  }
  return count;
}

// Counts the number of intervals in pi that consist of consecutive
// integers
//
// O(n log n)
int CountValid(const IntVector& pi) {
  int n = pi.size();
  IntVector pi_inv(n, -1);
  // Validate and invert pi
  for (int i = 0; i < n; ++i) {
    if (pi[i] < 0 || pi[i] >= n || pi_inv[pi[i]] != -1) {
      throw std::runtime_error("Invalid permutation of {0, ..., n - 1}");
    }
    pi_inv[pi[i]] = i;
  }
  return CountValidRecursive(pi, pi_inv, 0, n - 1);
}

// For testing purposes
int SlowCountValid(const IntVector& pi) {
  int count = 0;
  int n = pi.size();
  for (int left = 0; left < n; ++left) {
    for (int right = left; right < n; ++right) {
      int lower = pi[left];
      int upper = pi[left];
      for (int i = left + 1; i <= right; ++i) {
        if (pi[i] < lower) {
          lower = pi[i];
        } else if (pi[i] > upper) {
          upper = pi[i];
        }
      }
      if (upper - lower == right - left) ++count;
    }
  }
  return count;
}
}  // namespace

int main() {
  int n = 10;
  IntVector pi(n);
  for (int i = 0; i < n; ++i) pi[i] = i;
  do {
    if (SlowCountValid(pi) != CountValid(pi)) {
      fprintf(stderr, "Bad permutation:");
      for (IntVector::const_iterator x = pi.begin(); x != pi.end(); ++x) {
        fprintf(stderr, " %d", *x);
      }
      fputc('\n', stderr);
    }
  } while (std::next_permutation(pi.begin(), pi.end()));
}

2010-07-27 18:26:01

That is really impressive. I'm speechless. Congratulations! Takes about 15 seconds on a quadcore to run for a million elements random permutation (even if the result overflows, it's an interesting test), which is very good considering how complex the code is and the STL containers used.

IVlad 2010-07-27 20:51:03

One of those answers for which +1 seems utterly inadequate. I run some timings on the functions and the `CountValid` starts beating `SlowCountValid` at around n = 47.

Unreason 2010-07-30 10:35:08

Here's the performance compared for n=3..99 with 100 samples (http://i32.tinypic.com/do6n3b.png) and with 1000 samples (http://i32.tinypic.com/33kfb7q.png)

Unreason 2010-07-30 11:03:28

The intersection of two valid blocks is not necessarily non-empty and a valid block. Consider [1 2 3 4 5] and [5 6 7], sub blocks of [1..7]. Their intersection is non-empty, [5] and an invalid block. By the way, thanks to Ben Schwehn who pointed out that my "answer" was incorrect. So embarrassing.

Allen 2010-08-01 21:24:04

Allen, +1 because you're technically right, but if we define one-element blocks to be valid, everything works out OK.

2010-08-01 22:01:29

ansaurus

tags:

views:

answers:

Finding sorted sub-sequences in a permutation

related questions