views:

73

answers:

3

I'm trying to compare data from two sources.

ORIG Kick-Ass: Music From The Motion Picture
ALT Kick-A*s (Music from the Motion Picture)
ALT Kick-Ass: (Music from the Motion Picture)[Explicit]
ALT Kick-Ass: A dedication

ALT 1 ALT 2 and ORIG are the same match. ALT 3 is a dummy result.

I need to verify that these have a match, is there any methods available to me within the PHP library. I was thinking of counting each individual character with count_chars, then comparing that do the ORIG string using a percentage match. However if its a short title it wouldn't work too well.

Do you have any ideas how I could verify that they match,

Cheers,

J

+5  A: 

Well, there's always the levenshtein distance, but I'm not sure how ultimately useful that would be for you.

Could be worth a shot, though.

Peter Bailey
I've used this function with some degree of success for exactly this kind of thing
seengee
+1 for the related function
galambalazs
Looks great, I'll be using it to find the nearest match in an array of strings so it looks perfect. I'll be sure to ask again if it doesn't work out! Thanks for fast reply.
Jamie
+1  A: 

You can try something like: Hamming distance

galambalazs
+1 for alternative, however Hamming is for strings of the same length which may not be the case.
Jamie
+2  A: 

you could consider using edit distance

http://en.wikipedia.org/wiki/Levenshtein_distance

the php call:

http://ca2.php.net/levenshtein

it returns the number of changes you would have to make (insertions & deletions) to transform one string into another

hth

paintcan
+1 for information, however same answer posted by Peter. Cheers
Jamie