views:

189

answers:

1

Hi,

I'm looking for a package (any language, really) that I can use on a corpus of 50 documents to perform interdocument similarity testing in various metrics, like tfidf, okapi, language models, lsa, etc.

I want as a result a document similarity matrix, i.e. doc1 is x% similar to doc2, etc... This is for research purposes, not for production. I specifically want the doc similarity matrix as I want to correlate this with human ratings.

Thank you in advance!

A: 

If you know python, you can use http://www.nltk.org - it has everything you need, and plus is the documentation and the python language

roman