I have various length strings which are full of Base64 chars. Actualy they are audio recognition datas differs by song-to-song.
For easily comparing parts of those strings i divide them into 16-char sub-strings. (which is about 1 second of a song) But in some cases, i just can't compare these ones head to head.. i should be measuring them.
For example comparison with 'hellohellohelloo
' and 'hallohellohelloo
' should get a closer value then 'hellohellohelloo
' and 'herehellohelloo
' comparison.
Is there any algorithm or theorical
Edit: Sorry, i am new here :) And i couldn't make myself clear. Here are some comments that will make me clear and proposes an idea.
Comment 1:
Actually i know about Levenshtein distance, but the problem is every time i compare two strings i have to build comparison matrix and that makes searching process slow. If i can convert for example hello to 4444 and hallo to 4443 i can determine how close records i have for 'hello' by just indexing numerical values.
Comment 2:
Maybe i should determine a base constant-length string(s) and store distance values from them as the index values for string. It's just an idea?!