I need a bit of help with is this code. I know the sections that should be recursive, or at least I think I do but am not sure how to implement it. I am trying to implement a path finding program from an alignment matrix that will find multiple routes back to the zero value. For example if you excute my code and insert CGCA as the first sequence and CACGTAT as the second sequence, and 1, 0, and -1 as the match, mismatch and gap scores. The program gives off the path as HDHHDD and the aligment as
CACGTAT
CGC--A-.
However there are more possible paths and aligments then this, except I don't know how many. What I want to do is have a piece of my code loop back on itself and find other paths and alignments, using the same code as the first time around, until it runs out of possible alignments. The best way I found on the net to do this is recursion, except no one can explain how to do it. In this case there should be two more paths and aligmennts HDDDHHD and CACGTAT, and C--GCA-, and. HDDDDHH , CACGTAT AND --CGCA-. I just don't know how to code to perform this task.
# Implementation of Needleman and Wunsch Algorithm
my($seq1, $len1, $seq2, $len2, $data, @matrix, $i, $j, $x, $y, $val1, $val2);
my($val3, $pathrow, $pathcol, $seq1loc, $seq2loc, $gapscore, $matchscore, $mismatchscore);
#first obtain the data from the user.
print "Please enter the first sequence for comaprsion\n";
$seq1=<STDIN>;
chomp $seq1;
print "Please enter the second sequence for comparsion\n";
$seq2=<STDIN>;
chomp $seq2;
# adding extra characters so sequences align with matrix
# saves some calculations later on
$seq1 = " " . $seq1;
$seq2 = " " . $seq2;
$len1 = length($seq1);
$len2 = length($seq2);
print "Enter the match score: ";
$matchscore=<STDIN>;
chomp $matchscore;
print "Enter the mismatch score: ";
$mismatchscore=<STDIN>;
chomp $mismatchscore;
print "Enter the gap score: ";
$gapscore=<STDIN>;
chomp $gapscore;
# declare a two dimensional array and initialize to spaces
# array must contain one extra row and one extra column
@matrix = ();
for($i = 0; $i < $len1; $i++){
for($j = 0; $j < $len2; $j++){
$matrix[$i][$j] = ' ';
}
}
# initialize 1st row and 1st column of matrix
$matrix[0][0] = 0;
for ($i = 1; $i < $len1; $i ++){
$matrix[$i][0] = $matrix[$i-1][0] + $gapscore;
}
for ($i = 1; $i < $len2; $i ++){
$matrix[0][$i] = $matrix[0][$i-1] + $gapscore;
}
# STEP 1:
# Fill in rest of matrix using the following rules:
# determine three possible values for matrix[x][y]
# value 1 = add gap score to matrix[x][y-1]
# value 2 = add gap score to matrix[x-1][y]
# value 3 = add match score or mismatch score to
# matrix[x-1][y-1] depending on nucleotide
# match for position x of $seq1 and position y
# of seq2
# place the largest of the three values in matrix[x][y]
#
# Best alignment score appears in matrix[$len1][$len2].
for($x = 1; $x < $len1; $x++){
for($y = 1; $y < $len2; $y++){
$val1 = $matrix[$x][$y-1] + $gapscore;
$val2 = $matrix[$x-1][$y] + $gapscore;
if (substr($seq1, $x, 1) eq substr($seq2, $y, 1)){
$val3 = $matrix[$x-1][$y-1] + $matchscore;
}
else{
$val3 = $matrix[$x-1][$y-1] + $mismatchscore;
}
if (($val1 >= $val2) && ($val1 >= $val3)){
$matrix[$x][$y] = $val1;
}
elsif (($val2 >= $val1) && ($val2 >= $val3)){
$matrix[$x][$y] = $val2;
}
else{
$matrix[$x][$y] = $val3;
}
}
}
# Display scoring matrix
print "MATRIX:\n";
for($x = 0; $x < $len1; $x++){
for($y = 0; $y < $len2; $y++){
print "$matrix[$x][$y] ";
}
print "\n";
}
print "\n";
# STEP 2:
# Begin at matrix[$len1][$len2] and find a path to
# matrix[0][0].
# Build string to hold path pattern by concatenating either
# 'H' (current cell produced by cell horizontally to left),
# 'D' (current cell produced by cell on diagonal),
# 'V' (current cell produced by cell vertically above)
# ***This is were I need help I need this code to be recursive, so I can find more then one path***
$pathrow = $len1-1;
$pathcol = $len2-1;
while (($pathrow != 0) || ($pathcol != 0)){
if ($pathrow == 0){
# must be from cell to left
$path = $path . 'H';
$pathcol = $pathcol - 1;
}
elsif ($pathcol == 0){
# must be from cell above
$path = $path . 'V';
$pathrow = $pathrow - 1;
}
# could be from any direction
elsif (($matrix[$pathrow][$pathcol-1] + $gapscore) == $matrix[$pathrow][$pathcol]){
# from left
$path = $path . 'H';
$pathcol = $pathcol - 1;
}
elsif (($matrix[$pathrow-1][$pathcol] + $gapscore) == $matrix[$pathrow][$pathcol]){
# from above
$path = $path . 'V';
$pathrow = $pathrow - 1;
}
else{
# must be from diagonal
$path = $path . 'D';
$pathrow = $pathrow - 1;
$pathcol = $pathcol - 1;
}
}
print "Path is $path\n";
# STEP 3:
# Determine alignment pattern by reading path string
# created in step 2.
# Create two string variables ($alignseq1 and $alignseq2) to hold
# the sequences for alignment.
# Reading backwards from $seq1, $seq2 and path string,
# if string character is 'D', then
# concatenate to front of $alignseq1, last char in $seq1
# and to the front of $alignseq2, last char in $seq2
# if string character is 'V', then
# concatenate to front of $alignseq1, last char in $seq1
# and to the front of $alignseq2, the gap char
# if string character is 'H', then
# concatenate to front of $alignseq1 the gap char
# and to the front of $alignseq2, last char in $seq2
# Continue process until path string has been traversed.
# Result appears in string $alignseq1 and $seq2
***#I need this code to be recursive as well.***
$seq1loc = $len1-1;
$seq2loc = $len2-1;
$pathloc = 0;
print length($path);
while ($pathloc < length($path)){
if (substr($path, $pathloc, 1) eq 'D'){
$alignseq1 = substr($seq1, $seq1loc, 1) . $alignseq1;
$alignseq2 = substr($seq2, $seq2loc, 1) . $alignseq2;
$seq1loc--;
$seq2loc--;
}
elsif (substr($path, $pathloc, 1) eq 'V'){
$alignseq1 = substr($seq1, $seq1loc, 1) . $alignseq1;
$alignseq2 = '-' . $alignseq2;
$seq1loc--;
}
else{ # must be an H
$alignseq1 = '-' . $alignseq1;
$alignseq2 = substr($seq2, $seq2loc, 1) . $alignseq2;
$seq2loc--;
}
$pathloc++;
}
print "\nAligned Sequences:\n";
print "$alignseq2 \n";
print "$alignseq1 \n";
# statement may be needed to hold output screen
print "Press any key to exit program";
$x = <STDIN>;
If anyone is wondering this is a needleman-wunsch algorithm. Any help here would be greatly apperciated.