views:

446

answers:

4

I have a small site where I want to get Related Videos on basis of Tags... what could be the best MS SQL 2005 query to get related Videos on basis of Tags. If you can give LINQ query that would be awsome.

Here is Database Schema:

CREATE TABLE Videos
    (VideoID bigint not null , 
    Title varchar(100) NULL, 
    Tags varchar(MAX) NULL, 
    isActive bit NULL  )

INSERT INTO Videos VALUES ( 1,'Beyonce Shakira - Beautiful Liar','shakira, beyonce, music, video',1)
INSERT INTO Videos VALUES ( 2,'Beyonce Ego Remix','beyonce, music, video',1)
INSERT INTO Videos VALUES ( 3,'Beyonce Ego','beyonce, music, video',1)

What I want that on Viewing of Video with ID 1 it should show related videos on basis of its tags and most matched terms should come on top.

Thanks in Advance

A: 

You would be better off splitting the schema so that the tags are in a separate table and are then linked to the videos using an intermediate table an example of this could be...

select v.*
from Video v
  inner join VideoTag vt 
    inner join Tag t on vt.TagID = t.TagID
  on v.VideoID = vt.VideoID 
where t.Description = @tagText

where the revised schema looks like

Video

VideoID
Title
Description

Tag

TagID
Description

VideoTag

VideoID
TagID

Alternatively, you could try using a simpler query like

select VideoID, Title, Description
from Video
where Tags like '%' + @tag + '%'

but this would match on tags that contain other tags (such as 'art' and 'martial art') which is why I believe the splitting of the schema to be a better solution.

Martin Robins
It'd be even better if there was a `TagAffinity` column in the VideoTag table, making it easier to select which videos were most suited to a given query (assuming the OP is limited to SQL statements and can't use basket analysis)
Jeremy Smyth
where Tags = @tag or Tags like '%,' + @tag or Tags like @tag + ',%' or Tags like '%,' + @tag + ',%'
Stobor
A: 

If you're going on the basis of tags, you'll just want a WHERE tag = 'thistag' condition (if there's one tag in the same table), or WHERE tag in (SELECT tag FROM tags_table WHERE video_id = this_video_id) if you normalise the tags out of your videos table, although either solution will probably return a lot of videos, so you'd have to prune it somehow.

That's where it gets interesting (and difficult); you'd not only need to store a set of tags for each video, but also an affiliation score for each tag-to-video relationship. And that gets messy and subjective.

On the other hand, if you really want the "most matched terms" (ask mentioned in your edit), what I really think you need is a data mining query. Basket analysis is the technique usually used to show "related items" when people are interested in looking at things that other people interested in this item are also interested in. That's a bit beyond a SQL query, but if you have SQL Server 2005, it's part of the Analysis Services package. Worth a look!

Edit: now that you've posted the schema, I'd strongly suggest you normalise your Tags field into another table. It's very hard to code around a field that contains several delimited items, and is addressed by 1st form normalisation.

Jeremy Smyth
You Mean If I use this StructureTable Videos - VideoID - Title - isActiveTags Table - TagID - TagVideoTags Table - VideoID - TagIDthen what will be the query?
Marc V
A: 

If I have understood your question correctly in that you have a table of videos and in that table there is a column 'tags' with a load of tags separated with a comma; then here is the LINQ query...

dbDataContext db = new dbDataContext();

var movies =
    from v in db.Videos
    where v.Tags.Contains("Thriller")
    select v;

I made a quick table with VideoId, Name and Tags. I added a movie 'The Matrix' and put tags 'Thriller;Action;Drama' - that query found The Matrix record.

As a side note, would it not be better to have a table of tags, then an table that houses VideoId's and what tags belong to them?

Just a thought. Hope that helps any way.

Chalkey
How to read all the tags from Source Video and find all Videos on basis of all tags? And most matched terms come on top.
Marc V
+5  A: 

The schema you show, denormalized with all tags for each video stuffed into the Tags string, is badly designed for your purposes -- there is no reasonable way in TSQL to compute a meaningful "commonality" between two strings in such format, and therefore no reasonable way to check what pairs of items have relatively high commonality and thus may be deemed "related". If the schema is untouchable, you'll have to implement a user-defined function (in C# or other .NET language) for the purpose, and even then you will more or less have to scan the whole table since there's no reasonable way to index on such a basis.

If you can redesign the schema (with two more tables: one to hold the tags, and one to give the many-many relationship between tags and videos) there may be better prospects; in this case, some indication on roughly how many (order of magnitude) videos you expect to hav, how many (ditto) distinct tags overall, and roughly what number of tags a video would be expected to have, might allow designing and effective way to pursue your purposes.

Edit: per comments, apparently the schema can be redesigned, although still no indication was given about the numbers I asked, so appropriate indices &c will remain a total mystery. Anyway, suppose the schema is something like (each table can have other columns as desired, just add them to the query; and the VARCHAR lenghts don't matter either):

CREATE TABLE Videos (VideoID INT PRIMARY KEY,
                     VideoTitle VARCHAR(80));

CREATE TABLE Tags (TagID INT PRIMARY KEY,
                   TagText VARCHAR(20));

CREATE TABLE VideosTags (VideoID FOREIGN KEY REFERENCES Videos,
                         TagID FOREIGN KEY REFERENCES Tags,
                         PRIMARY KEY (VideoId, TagId));

i.e. just the classic 'many-many relationship' textbook example.

Now given the title of a video, say @MyTitle, titles of the 5 videos most "related" to it could easily be queried by, for example:

WITH MyTags(TagId) AS
(
  SELECT VT1.TagID
  FROM Videos V1
  JOIN VideosTags VT1 ON (V1.VideoID=VT1.VideoID)
  WHERE V1.VideoTitle=@MyTitle
)
SELECT TOP(5) V2.VideoTitle, COUNT(*) AS CommonTags
FROM Videos V2
JOIN VideosTags VT2 ON (V2.VideoID=VT2.VideoID)
JOIN MyTags ON (VT2.TagId=MyTags.TagId)
GROUP BY V2.VideoId
ORDER BY CommonTags DESC;
Alex Martelli
Lets say if I can redesign the schema then What will be the SQL query?
Marc V
If you can redesign the schema, then please see my answer above.
Martin Robins