I'm using this piece of Java code to find similar strings:
if( str1.indexof(str2) >= 0 || str2.indexof(str1) >= 0 ) .......
but With str1 = "pizzabase"
and str2 = "namedpizzaowl"
it doesn't work.
I'm using this piece of Java code to find similar strings:
if( str1.indexof(str2) >= 0 || str2.indexof(str1) >= 0 ) .......
but With str1 = "pizzabase"
and str2 = "namedpizzaowl"
it doesn't work.
If your algorithm says two strings are similar when they contain a common substring, then this algorithm will always return true; the empty string ""
is trivially a substring of every string. Also it makes more sense to determine the degree of similarity between strings, and return a number rather than a boolean.
This is a good algorithm for determining string (or more generally, sequence) similarity: http://en.wikipedia.org/wiki/Levenshtein_distance.
Iterate over each letter in str1
, checking for it's existence in str2
. If it doesn't exist, move on to the next letter, if it does, increase the length of the substring in str1
that you check for in str2
to two characters, and repeat until no further matches are found or you have iterated through str1
.
This will find all substrings shared, but is - like bubble sort - hardly optimal while a very basic example of how to solve a problem.
Something like this pseudo-ish example:
pos = 0
len = 1
matches = [];
while (pos < str1.length()) {
while (str2.indexOf(str1.substring(pos, len))) {
len++;
}
matches.push(str1.substring(pos, len - 1));
pos++;
len = 1;
}