views:

58

answers:

1

I am working at an movie recommendations engine and i'm facing a DB design issue. My actual database looks like this:

MOVIES [ID,TITLE] 
KEYWORDS_TABLE [ID,KEY_ID] 
  • where ID is Foreign Key for MOVIES.id and KEY_ID is a key for a text keywords table

This is not the entire DB, but i showed here what's important for my problem. I have about 50,000 movies and about 1,3 milion keywords correlations, and basically my algorithm consists in extracting all the who have the same keywords with a given movie, then ordering them by the number of keywords correlations.

For example i looked for a movie similar to 'Cast away' and it returned 'Six days and six nights' because it had the most keywords correlations (4 keywords):

Island
Airplane crash
Stranded
Pilot

The algorithm is based on more factors, but this one is the most important and the most difficult for the approach.

Basically what i do now is getting all the movies that have at least one keyword similar to the given movie and then ordering them by other factors which are not important for a moment.

There wouldn't be any problem if there weren't so many records, a query lasts in many cases up to 10-20 seconds and some of them return even over 5000 movies. Someone already helped me on here (thanks Mark Byers) with optimizing the query but that's not enough because it takes too longer

SELECT DISTINCT M.title
FROM keywords_table K1
JOIN keywords_table K2
ON K2.key_id = K1.key_id
JOIN movies M
ON K2.id = M.id
WHERE K1.id = 4

So i thought it would be better if i pre-made those lists with movies recommendations for each movie, but i'm not sure how to design the tables.. whatever is it a good idea or how would you take this approach?

+1  A: 

@Supyxy a better approach for movie recommendations would be to use a neural network approach based on abstract features generated after analyzing the movie database (factors for generating those features could be keywords | people's ratings | director's name | actors, etc). A sample open source movie recommendation engine is provided by the Filmaster project. You can read about it on the wiki: http://filmaster.org/display/DEV/New+recommendation+engine and get the code from the bitbucket repository: http://bitbucket.org/filmaster/filmaster-test/src/tip/count_recommendations.cpp

But answering the problem you have in your current solution, if further query optimization fails, I would add some more redundancy into data be storing references to similar films for each film (computing could be done lazily or in a cron job daily as those values should not change often).

michuk