views:

1551

answers:

2

I am looking for a simple but "good enough" Named Entity Recognition library (and dictionary) for java, I am looking to process emails and documents and extract some "basic information" like: Names, places, Address and Dates

I've been looking around, and most seems to be on the heavy side and full NLP kind of projects.

Any recommendations ?

+5  A: 

You might want to have a look at one of my earlier answers to a similar problem.

Other than that, most lighter NER systems depend a lot on the domain used. You will find a whole lot of tools and papers about biomedical NER systems, for example. In addition to my previous post (which already contains my main recommendation if you want to do NER), here are some more tools you might want to look into:

  • The Stanford CER-NER
  • The Postech Biomedical NER System if you are interested in this particular domain
  • OpenCalais seems to be a commercial system. There are UIMA wrappers for OpenCalais but they seem dated. There is also a dictionary based Context-Mapper annotator for UIMA that may help you out. Be aware that UIMA implies significant overhead in learning curve ;-)
  • OpenNLP also have an NER tool.
  • Balie does NER, too, among other things.
  • ABNER does NER, but again its focused on the biomedical domain.
  • The JULIE Lab Tools from the university of Jena, Germany also do NER. They have standalone versions and UIMA analysis engines.

One additional remark: you won't get away without tokenization on the input. Tokenization of natural language is slightly non-trivial, that's why I suggest you use a toolbox that does both for you.

Aleksandar Dimitrov
+1  A: 

BTW, I recently ran across OpenCalais which seems to havethe functionality I was looking after.

webclimber