tags:

views:

65

answers:

1

Hi all,

I am running some string matching tests using the Smith-Waterman algorithm. I am currently using SimMetrics (the Java open source project) to run the tests.

Can anyone explain why when I compare 'Bloggs J.' to 'Bloggs' I get a similarity value of 1.0?

There obviously is a gap (e.g. 'o' and '.'), but it does not appear to be penalized.

Thank you in advance.

+1  A: 

The Smith-Waterman Algorithm is a local alignment algorithm. That means that it's designed to align pieces of strings that align well, as opposed to aligning whole strings. The "gap" you speak of is not supposed to be penalized as a gap because it is considered to have occurred outside the aligned region. No string with the length of 'Bloggs' could possibly align better to 'Bloggs J.' than 'Bloggs' does. If you want a global alignment, you should use the Needleman-Wunsch Algorithm instead.

dsimcha