Longest Common Subsequence

views:

154

answers:

+1 Q:

Longest Common Subsequence

Consider 2 sequences X[1..m] and Y[1..n]. The memoization algorithm would compute the LCS in time O(m*n). Is there any better algorithm to find out LCS wrt time? I guess memoization done diagonally can give us O(min(m,n)) time complexity.

+1 A:

This classic problem is considered NP-Hard.

Here is the Wikipedia Article for more information.

Sev 2010-06-09 06:06:39

-1. Calling this "NP-hard" is very misleading -- that's for an arbitrary number k of sequences. The OP's question is about 2 sequences, which is not NP-hard.

j_random_hacker 2010-09-08 10:39:24

I wrote that the classic problem is considered np-hard, and the classic one is arbitrary sequences and np-hard

Sev 2010-09-08 18:36:25

It's still misleading. "This classic problem" doesn't mean "A more general version of the problem you asked about". The only reasonable interpretation of your first sentence is, "The problem you asked about, which by the way is a classic problem, is considered NP-hard." Which is false.

j_random_hacker 2010-09-09 04:35:54

You mean the classic problem is not np-hard either?

Sev 2010-09-09 04:48:27

"The classic problem" does not mean what you think it means! It's perfectly correct to describe the 2-sequence version of the LCS problem, which is solvable in polynomial time, as "classic". The k-sequence version can also be described as "a classic problem". The k-sequence version, which is *not* what the OP asked about, is NP-hard.

j_random_hacker 2010-09-10 11:35:27

To hopefully further clarify things: "Classic" in this context just means "established a long time ago, extensively studied, considered to be interesting by experts". It does not imply generalisation. I would suggest changing your first sentence to: "The problem on 2 sequences is solvable in O(mn) time. The generalisation to *k* sequences makes the problem NP-hard, requiring time exponential in *k* to solve."

j_random_hacker 2010-09-10 11:54:33

+1 A:

If you know a priori an upper bound on the maximum size k you care about, you can force the LCS algorithm to exit early by adding an extra check in the inner loop. This means then when k << min(m,n) you can get small running times in spite of the fact you are doing LCS.

Ira Baxter 2010-06-09 06:21:13

+1 A:

Gene Myers in 1986 came up with a very nice algorithm for this, described here: An O(ND) Difference Algorithm and Its Variations.

This algorithm takes time proportional to the edit distance between sequences, so it is much faster when the difference is small. It works by looping over all possible edit distances, starting from 0, until it finds a distance for which an edit script (in some ways the dual of an LCS) can be constructed. This means that you can "bail out early" if the difference grows above some threshold, which is sometimes convenient.

I believe this algorithm is still used in many diff implementations.

j_random_hacker 2010-09-10 11:56:02

ansaurus

tags:

views:

answers:

Longest Common Subsequence

related questions