I need to develop an application that will index several texts and I need to search for people’s names inside these texts. The problem is that, while a person’s correct name is “Gregory Jackson Junior”, inside the text, the name might me written as:
- Greg Jackson Jr
- Gegory Jackson Jr
- Gregory Jackson
- Gregory J. Junior
I plan to index the texts on a nightly bases and build a database index to speed up the search. I would like recommendation for good books and/or good articles on the subject.
Thanks
views:
309answers:
2
+1
A:
Check these related questions.
http://stackoverflow.com/questions/246961/algorithm-to-find-similar-text
http://stackoverflow.com/questions/338661/how-to-search-for-a-persons-name-in-a-text-heuristic
Shoban
2009-06-25 14:16:48
Thanks for the references. I did check them out prior to posting the question. The first one was focused on articles and real-time search. And the second article, the best answers were refering to a particular database engine, but had little algorithm content.
Pascal
2009-06-25 14:39:22
+1
A:
Your question is incorrectly phrased. The examples do not indicate misspelling but change in the form of writing a full name.
And,
- would your search expect to match on words like son with reference to the example?
- would it expect to match bob when looking for a name called Robert?
Ok, reading your comment suggests you do not want to venture into that.
nik
2009-06-25 14:40:58