views:

116

answers:

4

I am developing web application where users have collection of tags. I need to create a suggestion list for users based on the similarity of their tags.
For example, when a user logs in to the system, system gets his tags and search these tags in the DB of users and showing users who have similar tags. For instance if User 1 has following tags [Linux, Apache, MySQL, PHP] and User 2 has [Windows, IIS, PHP, MySQL] it says that User 2 matchs User 1 with a weight of 50%, because he has 2 similar tags(PHP and MySQL).
But imagine the situation where User 1 has [ASP, IIS, MS Access] and User 2 has [PHP, Apache, MySQL]. In this situation my system doesn't suggest User 2 as a "friend" to User 1 or vice versa. But we know that these two users has similarity on the field of work, both works on Web Technology (or Web Programming, etc).
So, that is why I need kind of taxonomy of computer science (right now, but probably I would need taxonomy of other fields also, like medicine, physics, mathematics, etc.) where these concepts are categorized and so that when I search for similarity of ASP and PHP, for example, it can say that they have similarity and belong into one group(or category).
I hope I described my problem clearly, but if something wrong explained would be happy for your corrections.
Thanks

+2  A: 

If these terms appear in forum or something like that, you can use Latent Semantic Analysis to construct clusters of terms.

vartec
+2  A: 

Generate some using google sets? It would be harder to get a larger data set than that:

http://labs.google.com/sets

scomar
I don't know how this tool could help me but do you know any APIs to use this tool? And, how this tool will benefit me?
Bakhtiyor
there's no official api, but there are some 3rd party ones people have hacked together (try googling). they don't have to be particularly reliable anyway as you only need to do a few queries and put the results in your database.If you read the description of the tool, you'll see it does exactly what you require - from a few similar words, extrapolate to find a larger group of similar words.
scomar
+1  A: 

You need to create relationships between tags. I don't believe this can be done automatically. You have to create a database which says sql=mysql=postgresql=oracle, asp=jsp=php and so on. This way you createsome kind of tag-groups. Tags can sure be in multiple relationships.

codymanix
+2  A: 

I don't think you actually need a taxonomy. With enough data, you should be able to do cluster analysis on the fields and infer the relationships between the tags. See this paper on automated tag clustering for some details. If you don't think that tag clustering and analysis based on tags can get you as far as you want, look at Flickr.

Alternatively, if you do think a taxonomy is required, consider using SKOS. If you can map your tags to SKOS, then you can perform this kind of analysis on them. Two sources of SKOS data you may find particularly useful are Library of Congress Subject Headings and DbPedia. If you have more questions about using SKOS, try SemanticOverflow.

Tom Morris