For use to analyze documents on the Internet!
It does use NLTK throughout and is a great book if you want to do NLP with python, just like the name says! :)
ealdent
2009-11-07 03:13:40
+3
A:
Alternatively, R has many tools available for text mining, and it's easy to integrate with Python using RPy2.
Have a look at the Natural Language Processing view on CRAN. In particular, look at the tm
package. Here are some relevant links:
- Paper about the package in the Journal of Statistical Computing: http://www.jstatsoft.org/v25/i05/paper. The paper includes a nice example of an analysis of the R-devel mailing list (https://stat.ethz.ch/pipermail/r-devel/) newsgroup postings from 2006.
- Package homepage: http://cran.r-project.org/web/packages/tm/index.html
- Look at the introductory vignette: http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
In addition, R provides many tools for parsing HTML or XML. Have a look at this question for an example using the RCurl
and XML
packages.
Shane
2009-10-31 17:00:20