I have a real question.
I have a database with the schema as follows:
item
- id
- description
- other junk
tag
- id
- name
item2tag
- item_id
- tag_id
- count
Basically, each item is tagged as up to 10 things, with varying counts. There are 50,000 items and 50,000 tags, and about 500,000 entries in items2tag. I'd like to find, given one item, the "most similar" item.
By "most similar" I mean the item that has the most similar combination of tags... if something is "cool" twice as much as it is "funny," I want to find all other things that are almost "cool" twice as much as they are "funny." Of course, this should apply to 10 tags, not just 2.
Any ideas?