tags:

views:

205

answers:

2

Well I run a small video website and on the actual video page there is a strip of "related videos" similar to most video sides (e.g. YouTube) and currently all I'm doing is taking one of its tags randomly and finding other videos with the same tag. Not surprisingly this isn't a great method as some tags are very vague and some videos are mis-tagged.

Example of the current query:

SELECT video_name FROM videos INNER JOIN videotags ON videos.id=videotags.video_id INNER JOIN tags ON tags.id=videotags.tag_id WHERE tag_name='x' AND videos.id<>'y' LIMIT 5

Where x is any one of the tags from the current video and y is the ID from the current video. (P.S. I'm using parameterized queries don't worry)

I'm just curious as to how you all would handle this, maybe it would be better to incorporate similar video titles?

Here is how my database tables are setup:

VIDEOS TABLE
------------
video_id [PK,auto_increment] int(11)
video_name varchar(255)

TAGS TABLE
----------
tag_id [PK,auto_increment] int(11)
tag_name varchar(255)

VIDEOTAGS TABLE
---------------
tag_id [PK,FK] int(11)
video_id [PK,FK] int(11)

There's obviously more columns in the videos table but this just illustrates the simple many-to-many relationship with auto-incrementing primary keys on both sides

The site is built on PHP with a MySQL database, but that really doesn't matter :)

EDIT: There's been some talk of going down an organic route so I figure I'd post my other two tables that are semi-related that are to do with video views and video ratings. Now note I don't have any intention of adding more columns specifically to the video views table because of privacy issues (yes I know I store IPs in the rating table)

VIDEOVIEWS TABLE
----------------
video_id [FK] int(11)
view_time datetime

VIDEORATINGS TABLE
------------------
video_id [PK,FK] int(11)
ip_address [PK] varchar(15)
rating int(1)
rate_time datetime
A: 

Very interesting question.

This is just thinking out loud, but some options I can think of are:

1) Use all the tags - for example imagine queries for the list of videos that have each tag that this video does. Produce a list of videos ordered by the count of the number of those lists they appear on, i.e. the count of how many tags they have in common with this video. Ones with more tags in common are presumably 'more related'.

(I'm not suggesting you do multiple queries in reality, just trying to explain what I have in mind...someone with better SQL-fu than me can probably come up with a single query that does this. Perhaps you can additionally order by popularity or other information that you may have).

2) Try to come with an algorithm that lets the related videos naturally emerge, a la amazon 'people who bought this also bought this'. For example if you track who viewed what, you may be able to design a query that produces such a list.

frankodwyer
Idea one is basically what I was thinking but I have no idea how make that translate into SQL. As you put it, my SQL-fu is not strong enough
Andrew G. Johnson
Well, this is a good place to ask...perhaps ask for ideas on that specific query as a separate SQL question, and link it to this one? But I think it's worth keeping this question too, it's a good one.
frankodwyer
+1  A: 

This query should return the id's of videos (v2) that have tags in common with your given video (v1), in descending order of the number of tags in common.

SELECT v2.video_id
FROM VideoTags AS v1
  JOIN VideoTags AS v2
  USING (tag_id)
WHERE v1.video_id = ?
  AND v1.video_id <> v2.video_id
GROUP BY v2.video_id 
ORDER BY COUNT(*) DESC;
Bill Karwin
You could also add a LIMIT 5 (to limit the number of related videos to 5 for example) and change the last line to ORDER BY COUNT(*) DESC, RAND() to get random videos each time when they have the same score.
lpfavreau
Gave this best answer as it gets the job done, was hoping to get a few other opinions but whatevz :)
Andrew G. Johnson