I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).
Here's what i've learned so far:
- comparing Personal Names can't be solved 100%
- there are ways to achieve certain degree of accuracy.
- the answer will be locale-specific, that's OK.
I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.
For example, all the names below can refer to the same person:
- Berry Tsakala
- Bernard Tsakala
- Berry J. Tsakala
- Tsakala, Berry
I'm trying to:
- build (or copy) an algorithm which grades the relationship 2 input names
- find an indexing method (for names in my database, for hash tables, etc.)
note: My task isn't about finding names in text, but to compare 2 names. e.g.
name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%