views:

166

answers:

6

Could you tell me how I can calculate the DNA sequences by Java using Levenshtein algorithm

A: 

The wiki for Levenshtein contains an algorithm and an explanation of the resulting matrix. Simply implement the algorithm as a method and return the last element in the matrix.

Tim Bender
+2  A: 

Here is the algorithm from the Wikipedia page on Levenshtein distances:

 int LevenshteinDistance(char s[1..m], char t[1..n])
 {
   // d is a table with m+1 rows and n+1 columns
   declare int d[0..m, 0..n]

   for i from 0 to m
     d[i, 0] := i // deletion
   for j from 0 to n
     d[0, j] := j // insertion

   for j from 1 to n
   {
     for i from 1 to m
     {
       if s[i] = t[j] then 
         d[i, j] := d[i-1, j-1]
       else
         d[i, j] := minimum
                    (
                      d[i-1, j] + 1,  // deletion
                      d[i, j-1] + 1,  // insertion
                      d[i-1, j-1] + 1 // substitution
                    )
     }
   }

   return d[m, n]
 }

(I'm sure you can make java out of that with a little work.)

pass in your two DNA sequences as s and t and it will return the distance as an int.

beggs
A: 

Copy/Paste the function from the Levenshtein Distance Algorithm and use it like so:

 String a = "AAAAAAAAAAAAAAAAAA";
 String b = "AAAAAAAAACTAAAAAAA";

 int d = getLevenshteinDistance(a,b);
 System.out.println(d);
Ron Gejman
A: 

If you are just interested in calculating the variation between two DNA sequences you should use the Damerau–Levenshtein distance not the regular Levenshtein distance.

The wikipedia entry contains some sample code which you surely are able to map to java code.

jitter
+2  A: 

I believe this is what you're after. You can remove the System.out.println statements if you like. Note that if you leave them in, that the first row and columns are omitted from what is printed.

Verified against the results on the wikipedia page.

public int getLevenshteinDistance(String a, String b)
{
    // d is a table with m+1 rows and n+1 columns
    char[] s = (a).toCharArray();
    char[] t = (b).toCharArray();
    System.out.println(a + " - " + b);
    int m = s.length;
    int n = t.length;
    int[][] d = new int[m + 1][n + 1];

    int i;
    int j;
    for(i = 0; i < (m + 1); i++)
    {
        d[i][0] = i; //deletion
    }

    for(j = 0; j < (n + 1); j++)
    {
        d[0][j] = j; //insertion
    }

    for (j = 1; j < (n + 1); j++)
    {
        for (i = 1; i < (m + 1); i++)
        {
            if (s[i-1] == t[j-1])
            {
                d[i][j] = d[i-1][j-1];
            }
            else
            {
                d[i][j] = Math.min((d[i-1][j] + 1), //deletion
                        (Math.min((d[i][j-1] + 1), //insertion
                        (d[i-1][j-1] + 1)))); //substitution
            }
            System.out.print(" [" + d[i][j] + "]");
        }
        System.out.println("");
    }

    return d[m][n];
}

To test:

    String a = "Saturday";
    String b = "Sunday";
    int d = getLevenshteinDistance(a, b);
    System.out.println(d);
    a = "kitten";
    b = "sitting";
    d = getLevenshteinDistance(a, b);
    System.out.println(d);
bguiz
+3  A: 

Since you did not tag it as homework, I see no need in writing this yourself. Apache's StringUtils has it.

Bart Kiers