views:

154

answers:

3

Consider 2 sequences X[1..m] and Y[1..n]. The memoization algorithm would compute the LCS in time O(m*n). Is there any better algorithm to find out LCS wrt time? I guess memoization done diagonally can give us O(min(m,n)) time complexity.

+1  A: 

This classic problem is considered NP-Hard.

Here is the Wikipedia Article for more information.

Sev
-1. Calling this "NP-hard" is very misleading -- that's for an arbitrary number k of sequences. The OP's question is about 2 sequences, which is not NP-hard.
j_random_hacker
I wrote that the classic problem is considered np-hard, and the classic one is arbitrary sequences and np-hard
Sev
It's still misleading. "This classic problem" doesn't mean "A more general version of the problem you asked about". The only reasonable interpretation of your first sentence is, "The problem you asked about, which by the way is a classic problem, is considered NP-hard." Which is false.
j_random_hacker
You mean the classic problem is not np-hard either?
Sev
"The classic problem" does not mean what you think it means! It's perfectly correct to describe the 2-sequence version of the LCS problem, which is solvable in polynomial time, as "classic". The k-sequence version can also be described as "a classic problem". The k-sequence version, which is *not* what the OP asked about, is NP-hard.
j_random_hacker
To hopefully further clarify things: "Classic" in this context just means "established a long time ago, extensively studied, considered to be interesting by experts". It does not imply generalisation. I would suggest changing your first sentence to: "The problem on 2 sequences is solvable in O(mn) time. The generalisation to *k* sequences makes the problem NP-hard, requiring time exponential in *k* to solve."
j_random_hacker
+1  A: 

If you know a priori an upper bound on the maximum size k you care about, you can force the LCS algorithm to exit early by adding an extra check in the inner loop. This means then when k << min(m,n) you can get small running times in spite of the fact you are doing LCS.

Ira Baxter
+1  A: 

Gene Myers in 1986 came up with a very nice algorithm for this, described here: An O(ND) Difference Algorithm and Its Variations.

This algorithm takes time proportional to the edit distance between sequences, so it is much faster when the difference is small. It works by looping over all possible edit distances, starting from 0, until it finds a distance for which an edit script (in some ways the dual of an LCS) can be constructed. This means that you can "bail out early" if the difference grows above some threshold, which is sometimes convenient.

I believe this algorithm is still used in many diff implementations.

j_random_hacker