tags:

views:

133

answers:

4

Hi all,

First I need to programmatically get tags (unlike what is done here on stackoverflow) from the titles and descriptions of my posts. I don't want commonly used words to appear in keywords. Is there an efficient way of doing this?

After getting good tags, I would like to save them in MySQL DB.

Now, how do I efficiently get related posts using these automatically created tags? E.g. What is done here.

+4  A: 

Look up tf-idf. You're looking for terms with a high tf-idf score.

Nicolas78
@Getr G, wow that looks very complicated but seems to be the best way. any simpler ways of doing this? how is it done here on stackoverflow? do u know?
Sir Lojik
@Sir: it's done *manually* on SO. Like how you did when you asked this question, and how I did when I just removed two bogus tags from your question.
Shog9
@Shog9 i got it ALREADY!!!!!!!!!!!!!!! ur late!.
Sir Lojik
+2  A: 

I would advise against using this method. You can use it to suggest tags, but automatic tagging will be very, very hard to implement correctly and accurately.

One of the reasons for that is because computers don't understand semantics. Take any question here and try doing that. It will not work 95% of the time.

NullUserException
okay i get it. will have to forget bout that feature
Sir Lojik
+2  A: 

I'm guessing an online API service might help. Check:

OpenCalais - try pasting an article here: http://viewer.opencalais.com/

Or Yahoo's Term Extraction API: http://developer.yahoo.com/search/content/V1/termExtraction.html

Hope this help!

Amer
+1  A: 

I don't see how this would be possible without you having some sort of list.. how would your app know what words to use and not use? I suppose you could find a thesauraus that you could API into and use this to find tags, it would get rather complex, if you're looking to do it for SEO reasons, you could make the app look for words based off a keyword list, such as one you get from google keyword tool..

as far as how to do this, I use PHP all the time and think its great for building web apps but for this sort of thing, (processing of a lot of text data, regex, etc) I tend to have problems in PHP, maybe its just me, but I prefer using perl

Rick