tags:

views:

219

answers:

2

I'm searching for a library similar in functionality to the Perl Lingua::EN::NameParse module. Essentially, I'd like to parse strings like 'Mr. Bob R. Smith' into prefix, first name, last name, and name suffix components. Google hasn't been much help in finding something like this and I'd prefer not to roll my own if possible. Anyone know of a OSS Java library that can do this in a sophisticated way?

+1  A: 

Personally, I would opt for regular expressions. Here's a good intro. They're fast, concise and always do what you want.

If you want to stay within the boundaries of the java sdk, use String tokenizers.

A bit more low-level is JavaCC, a java based parser generator. Here's a link to a tutorial.

An alternative to javaCC is ANTLR, which I've personally had good experiences with.

Steen
+2  A: 

Maybe you could try the GATE named entity extraction component? It has build in jape grammar and gazetteer lists to extract first names, last names etc. among other things. See this page.

trex279