Clever way of building a tag cloud? - Python

views:

326

answers:

+2 Q:

Clever way of building a tag cloud? - Python

Hi folks,

I've built a content aggregator and would like to add a tag cloud representing the current trends.

Unfortunately this is quite complex, as I have to look for keywords that represent the context of each article.

For example words such as I, was, the, amazing, nice have no relation to context.

Help would be much appreciated! :)

+2 A:

NLTK can help you analyze the content in order to pick out relevant terms.

Ignacio Vazquez-Abrams 2010-03-21 03:32:03

+6 A:

Use NLTK, and in particular its Stopwords corpus:

Besides regular content words, there is another class of words called stop words that perform important grammatical functions, but are unlikely to be interesting by themselves. These include prepositions, complementizers, and determiners. NLTK comes bundled with the Stopwords corpus, a list of 2400 stop words across 11 different languages (including English).

Alex Martelli 2010-03-21 03:34:16

@Alex: thanks for the awesome answer! But can this deal with adjectives such as **good** **great** etc... ?

RadiantHex 2010-03-21 03:56:13

@Radiant, adjectives aren't stopwords, as they do convey meaning -- e.g., "The Great Wall" is a very specific and long wall in China, while "The Wall" is a Pink Floyd album -- etc. If you want to skip adjectives (a dubious decision), use NLTK to do "Parts-of-Speech tagging", per http://streamhacker.com/2008/11/03/part-of-speech-tagging-with-nltk-part-1/ (also read parts 2 and 3 of course).

Alex Martelli 2010-03-21 04:05:27

ansaurus

tags:

views:

answers:

Clever way of building a tag cloud? - Python

related questions