views:

228

answers:

3

I found "Natural Language Processing with Python" today, and am wondering what other good, non-academic (the research papers tend to be too dry and/or specific to certain areas) NLP resources the SO community knows about.

I'm starting-out in text processing for a couple hobby projects, and am keen to find good places to start :)

+1  A: 

nltk would be your obvious first stop. They have very good collection of links on the subject (and on using the software itself).

Steen
+1  A: 

If you're prepared to spend some money on a book, then I would recommend Speech and Language Processing by Daniel Jurafsky and James H.Martin. It is an introductory level book, but goes into a lot of depth over a wide range of topics. If you think think you might be doing text processing stuff for a while, then it'd be a good book to have on hand.

humble coffee
+9  A: 

Speech And Language Processing by Jurafsky and Martin is the standard textbook in the field. The coverage is very broad and accessible for non-linguist. Except for some pseudo code, that’s not a practical/programming book but you’ll have a good view of the field.

In the same vein, you can look at The Oxford Handbook of Computational Linguistics. This is a compilation of introductory articles.

For data mining and information retrieval, a comprehensive introduction is Information Retrieval by Manning, Raghavan and Schütze. A bit academic and can be heavy on the math.

For string algorithms, as Bob Carpenter puts it, Algorithms on Strings, Trees and Sequences "is the definitive text... The algorithms are abstracted from their biological applications, and the book would make sense without reading a single page of the biological motivations.".

If you’re more interested in Speech, Spoken Language Processing: A Guide to Theory, Algorithm and System Development is a nice introduction.

An alternative to the NLTK for Java users is Building Search Applications: Lucene, LingPipe, and Gate by Manu Konchady.

anno