I am developing web application where users have collection of tags. I need to create a suggestion list for users based on the similarity of their tags.
For example, when a user logs in to the system, system gets his tags and search these tags in the DB of users and showing users who have similar tags. For instance if User 1 has following tags [Linux, Apache, MySQL, PHP] and User 2 has [Windows, IIS, PHP, MySQL] it says that User 2 matchs User 1 with a weight of 50%, because he has 2 similar tags(PHP and MySQL).
But imagine the situation where User 1 has [ASP, IIS, MS Access] and User 2 has [PHP, Apache, MySQL]. In this situation my system doesn't suggest User 2 as a "friend" to User 1 or vice versa. But we know that these two users has similarity on the field of work, both works on Web Technology (or Web Programming, etc).
So, that is why I need kind of taxonomy of computer science (right now, but probably I would need taxonomy of other fields also, like medicine, physics, mathematics, etc.) where these concepts are categorized and so that when I search for similarity of ASP and PHP, for example, it can say that they have similarity and belong into one group(or category).
I hope I described my problem clearly, but if something wrong explained would be happy for your corrections.
Thanks
views:
116answers:
4If these terms appear in forum or something like that, you can use Latent Semantic Analysis to construct clusters of terms.
Generate some using google sets? It would be harder to get a larger data set than that:
You need to create relationships between tags. I don't believe this can be done automatically. You have to create a database which says sql=mysql=postgresql=oracle, asp=jsp=php and so on. This way you createsome kind of tag-groups. Tags can sure be in multiple relationships.
I don't think you actually need a taxonomy. With enough data, you should be able to do cluster analysis on the fields and infer the relationships between the tags. See this paper on automated tag clustering for some details. If you don't think that tag clustering and analysis based on tags can get you as far as you want, look at Flickr.
Alternatively, if you do think a taxonomy is required, consider using SKOS. If you can map your tags to SKOS, then you can perform this kind of analysis on them. Two sources of SKOS data you may find particularly useful are Library of Congress Subject Headings and DbPedia. If you have more questions about using SKOS, try SemanticOverflow.