ansaurus

Question

Are there any good summarizers for a web-page?

Answer 1

+2 A:

A simple text summarizer: http://pythonwise.blogspot.com/2008/01/simple-text-summarizer.html

Algorithm:

1. For each word, calculate it's frequency in the document
2. For each sentence in the document 
      score(sentence) = sum([freq(word) for word in sentence])
3. Print X top sentences such that their size < MAX_SUMMARY_SIZE

The MYYN 2009-11-25 08:55:32

The problem with this is that common words like 'it', 'and' etc. will get priority. A better idea would be to use the idea of relative requency, where you get the frequency of a word and divide it by a value which indicates how frequently it occurs in regular text.

Shoko 2009-12-04 00:09:38

Answer 2

+1 A:

Frequency counts will get you some of the way but Natural Language Processing will provide better results as it uses linguistic techniques to provide more accuracy.

Topia.termextract uses a Parts-Of-Speech (POS) tagging algorithm and is available from PyPi http://pypi.python.org/pypi/topia.termextract/

muffinresearch 2009-11-25 21:37:10

ansaurus

tags:

views:

answers:

Are there any good summarizers for a web-page?

related questions