Automatic Text Summarization
It sounds like you're interested in automatic text summarization. For a nice overview of the problem, issues involved, and available algorithms, take a look at Das and Martin's paper A Survey on Automatic Text Summarization (2007).
Simple Algorithm
A simple but reasonably effective summarization algorithm is to just select a limited number of sentences from the original text that contain the most frequent content words (i.e., the most frequent ones not including stop list words).
Summarizer(originalText, maxSummarySize):
// start with the raw freqs, e.g. [(10,'the'), (3,'language'), (8,'code')...]
wordFrequences = getWordCounts(originalText)
// filter, e.g. [(3, 'language'), (8, 'code')...]
contentWordFrequences = filtStopWords(wordFrequences)
// sort by freq & drop counts, e.g. ['code', 'language'...]
contentWordsSortbyFreq = sortByFreqThenDropFreq(contentWordFrequences)
// Split Sentences
sentences = getSentences(originalText)
// Select up to maxSummarySize sentences
setSummarySentences = {}
foreach word in contentWordsSortbyFreq:
firstMatchingSentence = search(sentences, word)
setSummarySentences.add(firstMatchingSentence)
if setSummarySentences.size() = maxSummarySize:
break
// construct summary out of select sentences, preserving original ordering
summary = ""
foreach sentence in sentences:
if sentence in setSummarySentences:
summary = summary + " " + sentence
return summary
Some open source packages that do summarization using this algorithm are:
Classifier4J (Java)
If you're using Java, you can use Classifier4J's module SimpleSummarizer.
Using the example found here, let's assume the original text is:
Classifier4J is a java package for working with text. Classifier4J includes a summariser. A Summariser allows the summary of text. A Summariser is really cool. I don't think there are any other java summarisers.
As seen in the following snippet, you can easily create a simple one sentence summary:
// Request a 1 sentence summary
String summary = summariser.summarise(longOriginalText, 1);
Using the algorithm above, this will produce Classifier4J includes a summariser.
.
NClassifier (C#)
If you're using C#, there's a port of Classifier4J to C# called NClassifier
Tristan Havelick's Summarizer for NLTK (Python)
There's a work-in-progress Python port of Classifier4J's summarizer built with Python's Natural Language Toolkit (NLTK) available here.