views:

79

answers:

4

Hi,

I'm designing an application that need to extract people's names from short texts.

What is the best way to do that? is there a database of names where I can test to know where is the name? the fact that the text is short it might not be as intensive in terms of processing needs.

Any ideas?

Thanks,

Tam

+3  A: 

You can use a statistical Named Entity Recognizer (NER), such as Stanford's NER, or LingPipe's. These are machine learning-based recognizers, that do not require huge dictionaries of names as input.

Alternatively, you can get a list of person names from the Web (there are plenty), and use the Aho-Corasick string searching algorithm to efficiently extract names from the list from text.

JG
+1  A: 

If you're on a *nix system, try looking at /usr/share/dict/propernames. Mac OS X has it, and I think at least Ubuntu does too.

You could use this with grep:

grep -f /usr/share/dict/propernames short_text.txt
jtbandes
Ubuntu does not have the propernames by default, only english words, however you can probably add them. But yes, OSX has it.
Jeremy Morgan
+2  A: 

I found this reference: Extracting people’s names from RSS feeds using WordNet

Pierre
WordNet is great, but I'm afraid it will only be able to extract famous people's names, such as Washington.
JG
+1  A: 

How about US census bureau genealogy data

Pratik