views:

55

answers:

3

I'm attempting to write a basic dna sequencer. In that, given two sequences of the same length, it will output the strings which are the same, with a minimal length of 3. So input of

abcdef dfeabc

will return

1 abc

I am not sure how to go about solving the problem. I can compare the two strings, and see if they are completely equal. From there, i can compare length-1 string size, i.e. if dfeabc contains abcde. However, how can i get the program to do all possible strings, down to a minimal size of 3 characters? One issue is for the above example of length-1, I'd also have to do the string bcdef and compare that.

+1  A: 

The naive way would be to get every single substring of string A and see if it's in string B.

Here's the naive way of doing that:

for ( int i = 0; i < a.length; i++ ) {
   for ( int j = i+1; j <= a.length; j++ ) {
       if (b.contains(a.substring(i, j))) {
           //if longer than last found match, record this match
       }
   }
}

The slightly more optimal way would be to look at longer substrings first, so that the first substring that matches is necessarily the longest.

for ( int length = a.length; length > 0; length-- ) {
     for ( int i = 0; i + length < a.length; i++ ) {
         if ( b.contains(a.substring(i, i+length)) ) {
             //this is the biggest match
         }
     }
}
Mark Peters
Certainly well-intended, but too much help for a homework question, IMO :-)
Fabian Steeg
Meh I was thinking if it's homework, which it appears to be, he probably isn't allowed to use the contains method (that'd be cheating) so he'd still have some blanks to fill in.
Mark Peters
To be honest, this was perfect, I already have something close, i just didn't have the second length loop to make it work properly. And its true, i can't use contains, so there is still a bit to do. Its not even for an assignment though, just a lab, so the naive solution is all I needed.
Blackbinary
A: 

If you wanted to solve this in a simple way, without using any Java API for searching, you could think of it like this: for every pair of possible starting indices i in the first string and j in the second string, step forward in both while the corresponding characters in the first and the second string are equal, and return a result if you did at least 3 steps.

Fabian Steeg
A: 

You need to use the Longest Common Substring algorithm, a dynamic programming problem. Check Wikipedia's entry for an explanation of the algorithm and here for a sample implementation.

teto