views:

69

answers:

3

I have a mysql database with movies as follows:

MOVIES(id,title)

KEYWORDS_TABLE(id,key_id) [id is referenced to movies.id, key_id is refernced to keywords.id]

KEYWORDS(id,keyword) //this doesn't matter on my example..

Basically i have movies with their titles and plot keywords for each one, i want to select all movies that have the same keywords with with a given movie id.

I tried something like:

SELECT key_id FROM keywords_table WHERE id=9

doing that in php and storing all the IDs in an array $key_id.. then i build another select that looks like:

SELECT movies.title FROM movies,keywords_table WHERE keywords_table.key_id=$key_id[1] OR keywords_table.key_id=$key_id[2] OR ......... OR keywords_table.key_id=$key_id[n]

This kinda works but it takes too much time as we talk about a database with thousands of thousands of records.

So, any suggestions?? thanks!

+3  A: 

One thing you could improve... Instead of writing x = a OR x = b OR x = c you can shorten it to just this: x IN (a, b, c).

SELECT movies.title
FROM movies,keywords_table
WHERE keywords_table.key_id IN ($key_id[1], $key_id[2], ..., $key_id[n])

Note also that you are missing a join condition in your query. You are currently doing a CROSS JOIN, also known as a cartesian product. I think you want this:

SELECT movies.title
FROM movies
JOIN keywords_table
ON movies.id = keywords_table.id
WHERE keywords_table.key_id IN ($key_id[1], $key_id[2], ..., $key_id[n])

This query can return the same movie more than once so you might want to add DISTINCT to remove the duplicates. Also you can do the whole thing in one query instead of two as a further optimization:

SELECT DISTINCT M.title
FROM keywords_table K1
JOIN keywords_table K2
ON K2.key_id = K1.key_id
JOIN movies M
ON K2.id = M.id
WHERE K1.id = 4

Regarding the performance, make sure that you have the primary key set to (id) on movies and to (key_id, id) on keywords_table.

Mark Byers
Joining is probably more efficient... and won't fail after 2100 records as in another question.
MvanGeest
I didn't mean that... both operations could be substituted by a single query with multiple `JOIN` s
MvanGeest
And this is reason why using the implicit joining is bad. Here was an unnoticed cross join. It is a poor practice to use implicit joins. Glad to see you use the explicit syntax instead of just fixing the implicit syntax to remove the cross join.
HLGEM
Deleted my previous comment after you fixed it. That final query sure looks nice. I'm going to test it right now.
MvanGeest
Yap it worked for me too, but it takes too much time (for my database), it seems building up the list with related movies on "real time" is not a good choice, i may try building some predefined lists with relations between movies or something.. what do you suggest guys??
Supyxy
Sorry - had to delete comment saying that it worked. Supyxy, can you run an `EXPLAIN` on that query and give us the amount of `rows` it's selecting? It should show four steps. phpMyAdmin will do this easily.
MvanGeest
Table K1 - 144 rowsTable K2 - 21 rowsTable M - 1 rowWell, i don't think it's a good solution, each movie has at least 50 keywords, there are about 1 milion overall, and the script must check for every keyword of a given movie if there are other movies with the same keyword.. it took about 1 minute lol:)
Supyxy
@Supyxy: Have you created the primary keys as I suggested? Can you post your table definitions? This query shouldn't be that slow.
Mark Byers
Movies - id[int(8), Primary Key, auto_increment, Unique], Keyword_table - id[int(8), Index, Foreign Key movies.id] / key_id[int(8), Index, Foreign Key keywords.id] .. should i remove the indexes and add Primary keys on them?
Supyxy
Yep i changed them to primary keys, it's quite better but it takes about 5 seconds, which is way too long. Will look for other methods, maybe i should redesign the tables and build some predefined lists with related movies like creating a table for each keyword and listing there all movies..Would it be fine if i open a new thread about optimizing my DB?
Supyxy
A: 

Try using the 'in' keyword instead of building a large number of boolean operations.

SELECT movies.title FROM movies WHERE keyword_table.key_id IN ($key_id[1],..,$key_id[n])
sabauma
A: 

Use a subselect:

SELECT DISTINCT m.title
FROM movies m
WHERE id IN (
    SELECT id
    FROM keywords_table
    WHERE id = 9);
stephenr
I think you are missing a join.
Mark Byers
Or two. Or five. :)
MvanGeest
yep. Didn't read the question properly.:(
stephenr