views:

112

answers:

2

Suppose I give you a URL...can you analyze the words and spit out the "keywords" of that page? (besides using meta-tags)

Are there good open-source summarizers out there? (preferably Python)

+2  A: 

A simple text summarizer: http://pythonwise.blogspot.com/2008/01/simple-text-summarizer.html

Algorithm:

1. For each word, calculate it's frequency in the document
2. For each sentence in the document 
      score(sentence) = sum([freq(word) for word in sentence])
3. Print X top sentences such that their size < MAX_SUMMARY_SIZE
The MYYN
The problem with this is that common words like 'it', 'and' etc. will get priority. A better idea would be to use the idea of relative requency, where you get the frequency of a word and divide it by a value which indicates how frequently it occurs in regular text.
Shoko
+1  A: 

Frequency counts will get you some of the way but Natural Language Processing will provide better results as it uses linguistic techniques to provide more accuracy.

Topia.termextract uses a Parts-Of-Speech (POS) tagging algorithm and is available from PyPi http://pypi.python.org/pypi/topia.termextract/

muffinresearch