How to get the same results as http://developer.yahoo.com/search/content/V1/termExtraction.html
This question has been asked quite a few times before.
http://stackoverflow.com/questions/1078766/best-approach-to-analyze-text-in-php
http://stackoverflow.com/questions/711062/what-is-a-good-keyword-extraction-web-service
http://stackoverflow.com/questions/465795/what-is-a-simple-way-to-generate-keywords-from-a-text
Trying to approach this problem with existing solutions I stumbled upon "Text Analysis" Solr performs on the document before indexing as described in http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters - which includes stemming as well.
So the final index will consist mostly of terms used to describe the document.
Is there a solution that provides analyzers, tokenizers, and token filters for direct use? If solr is the way out, what is the best way get this data from solr's index?