views:

74

answers:

1

I am creating a blogging functionality to my website running on a cms.

My requirement is this:

When a person has written an article, he must automatically provided with 'suggested tags'. These words must come from the article. How can I implement this functionality?

I thought some ideas. Like:

  1. Suggest words which are the longest. Using this, I can filter out 'a', 'of', 'my' etc but not 'because'.
  2. Create a blacklist of words. But I couldn't find any such ready-made list and creating such a list will take a very long time.

So, any other ideas?

A: 

You could do Bayesian classification and see what happens. Here's some example code.

UPDATE: This presupposes that you have some tags for the classifier to choose from. Here is a simple algorithm for extracting keywords from text if you need to initialize your list of tags.

Hank Gay