tags:

views:

63

answers:

2

Hello,

I would like to have a trending tags in my website according to searches users made. The problem I'm not seeing an simple solution for is how to easily extract the important terms out of a search string. For example, many users might search for "visual studio" with different purposes. For example, "visual studio 2010", "visual studio unit testing", "visual studio web forms components". In those 3 searches, "visual studio" is trending. How can an algorithm notice that since "visual studio" in most cases will be mixed with many other words?

Thank you!

+1  A: 

Have a look on this codeplex project

http://www.codeplex.com/TheTagCloud

Includes a function that you can pass an html file to as input and will return a tag cloud.

Chris Ballance
TheTagCloud seems to be based on the tags already being known. I think this question is trying to find out how best to build the tag set from the content.
Stephen Doyle
@Stephen The example only shows creation from a list of strings. The project supports an html page input also.
Chris Ballance
Yes, I've actually written something like TheTagCloud. What's being harder to find is how to extract precise tags based on searches made on a website.
JP Araujo
+2  A: 
  1. split every search query into an array of single words.
  2. calculate the distance between the words (the nearer, the better => higher value)
  3. add this word distance for each wordpair across all queries.

The wordpairs with the higher values are your "trending tags".

Martin Hohenberg
That sounds interesting. Would you give more details on this? The idea is to compare all possibilities or only a word with the the others that come after it, up to the end of the array?
JP Araujo
I used to do this on a many-to-many way: compare any two words within a given string. This makes sense once you realize that "prices visual studio" also uses visual studio as a "trending tag". On the other hand, you could also do this only regarding the firstWord to nth-word relationships (which saves computing time, but disregards "trailing tags" (seriously, there *must* be a better term ;) ) that appear later in the string.
Martin Hohenberg