For use to analyze documents on the Internet!
It does use NLTK throughout and is a great book if you want to do NLP with python, just like the name says! :)
2009-11-07 03:13:40
Alternatively, R has many tools available for text mining, and it's easy to integrate with Python using RPy2.
Have a look at the Natural Language Processing view on CRAN. In particular, look at the tm
package. Here are some relevant links:
- Paper about the package in the Journal of Statistical Computing: The paper includes a nice example of an analysis of the R-devel mailing list ( newsgroup postings from 2006.
- Package homepage:
- Look at the introductory vignette:
In addition, R provides many tools for parsing HTML or XML. Have a look at this question for an example using the RCurl
and XML
2009-10-31 17:00:20