Hi everyone.
I have a small problem with the core data application i'm currently writing. I have two differents models, contexts and peristent stores. One is for my app data, the other one is for a website with relevant infos to me.
Most of the time, I match exactly one record from my app to another record from the other source. Sometimes however, I have to fallback to fuzzy string matching to link the two records.
I'm trying to match song titles. My local title could be the (made up) "The French Idealist is in your pensée"
and the remote song title could be "01 - 10 - French idealist in in you're pensee, The (dub remix, feat. DJ Objective-C)"
I search stack overflow, Google, the cocoa documentation, and I can't find any clear answer on how to do a fuzzy matching in these cases. My strings can start with anything, have a bunch of special characters, usually end with random or to be ignored characters.
Regexp won't do, nor NSPredicates, Soundex doesn't work well with foreign names, and maybe the Levenshtein won't be enough (or will it ?).
I'm looking for a title in a set of about a dozen potential matches, but I hava to do this operation quite a lot. 100% accuracy is not the goal.
I was thinking of removing the ignored words, extracting the keywords (in this example, "french, idealist, pensée"), concatenate them, and then use the Levenshtein distance (words in song title should be in the same order).
In my special case, would it work ? What is the industry standard regarding this problem (I can't be the only one in the world who want to match slightly different songs names) Can Core Data, Cocoa or Objective-C help me ?
Thanks a lot.