views:

415

answers:

4

For 2 strings, I want to find the number of distinct LCS's. I read on wiki on how to print all LCS's but how to check that they are distinct? The hash table is not feasible as my input string each can be 1500-2000 characters long so maximum number of LCS's can be 2000 choose 1000

A: 

Can you specify.

  • Do substrings have to be consecutive, or any sub sequence would be ok? For example, in "abcde" would "abde" be a valid substring?
  • What do you mean by distinct?
husayt
No they dont need to be consecutive. It is the same standard LCS problem of dp. Its just that to find the number of distinct LCS if multiple LCS occur
Wifi
To answer your first question: no. This is not a substring by definition. Rather, it is a subsequence (again, by definition).
Konrad Rudolph
can you explain what do you mean by distinct?
husayt
It will also be a good idea to show what you want on example. As there are a number of similar problems, with tiny diffs. But these diffs they are enough to change the approach dramatically.
husayt
A: 

You can use a hash table, but instead of storing the whole substring, you just store (a list of) the beginning and end of it relative to the original string. This way, you can do a string compare in the original string if there are any collisions.

Johannes Hoff
+1  A: 

Throw the two strings to a suffix tree. This is time and space linear in length of the concatenation of the two strings.

David Lehavi
This is the answer I was thinking of posting. This data structure is best for finding common subsequences when the most flexibility is needed. For example, if there are lots of duplicated long subsequences, this data structure will have acceptable performance.
Heath Hunnicutt
I'm not aware that a suffix tree can be used for longest common *subsequences*. They can however be used to find the longest common substring.
MAK
A: 

Hi, Good god I found a posting on this topic. Actually, I faced this problem in Amazon interview recently and I sent a code to them for review. They did not tell me whether its correct or wrong but I did not get any further inputs from them (so I assumed i have erred somewhere). I dont want the efficient implementation as you guys discuss. Can anyone just review the code I've written to check its correct or not.

http://www.technicalypto.com/2010/02/find-common-sequence-that-is-out-of.html

Many thanks

Bragboy