I am working at an movie recommendations engine and i'm facing a DB design issue. My actual database looks like this:
MOVIES [ID,TITLE]
KEYWORDS_TABLE [ID,KEY_ID]
- where ID is Foreign Key for MOVIES.id and KEY_ID is a key for a text keywords table
This is not the entire DB, but i showed here what's important for my problem. I have about 50,000 movies and about 1,3 milion keywords correlations, and basically my algorithm consists in extracting all the who have the same keywords with a given movie, then ordering them by the number of keywords correlations.
For example i looked for a movie similar to 'Cast away' and it returned 'Six days and six nights' because it had the most keywords correlations (4 keywords):
Island
Airplane crash
Stranded
Pilot
The algorithm is based on more factors, but this one is the most important and the most difficult for the approach.
Basically what i do now is getting all the movies that have at least one keyword similar to the given movie and then ordering them by other factors which are not important for a moment.
There wouldn't be any problem if there weren't so many records, a query lasts in many cases up to 10-20 seconds and some of them return even over 5000 movies. Someone already helped me on here (thanks Mark Byers) with optimizing the query but that's not enough because it takes too longer
SELECT DISTINCT M.title
FROM keywords_table K1
JOIN keywords_table K2
ON K2.key_id = K1.key_id
JOIN movies M
ON K2.id = M.id
WHERE K1.id = 4
So i thought it would be better if i pre-made those lists with movies recommendations for each movie, but i'm not sure how to design the tables.. whatever is it a good idea or how would you take this approach?