views:

71

answers:

5

Hi guys... I've a MySQL question

I've two tables (posts and authors) in a one to many relationship (since each post is written by an author and an author can write multiple posts).

So here are the tables:

Authors:
   id:BIGINT, name:VARCHAR(255)

Posts: 
   id:BIGINT, author_id:BIGINT, body:TEXT

I've got 700,000 posts and 60,000 authors.

If I choose an author (e.g. author_id = 45) and I want a random post written by him I write:

SELECT * FROM Posts WHERE author_id = 45 ORDER BY RAND() LIMIT 1;

I know this is right, but when I got 4,000 simultaneous people online it takes about 6 secs..

Maybe indexing author_id column in Posts table would speed up things?

Thank you all! :)

+5  A: 

Indexing should mirror your most popular WHERE clause scenarios.

In this particular case, create your index, then change your query to this:

SELECT id,author_id,body 
FROM Posts 
WHERE author_id = 45 
ORDER BY RAND() 
LIMIT 1;

This will prevent a schema lookup prior to the search thereby increasing performance.

SELECT * is evil for high frequency queries.

randy melder
+2  A: 

Yes, you definitely should add the index.

CREATE INDEX Post_author_id ON Posts(author_id);

As further evidence, run

EXPLAIN SELECT * FROM Posts WHERE author_id = 45 ORDER BY RAND() LIMIT 1;
Frank Krueger
Is your syntax to add an index different in any way from this one?ALTER TABLE Posts ADD INDEX ( author_id )
checcco
Yes, those syntaxes are different. :-) `ALTER TABLE` isn't very portable between databases, whereas `CREATE INDEX` is pretty darned portable. I hate SQL so have only memorized the portable stuff.
Frank Krueger
A: 

If you haven't and index on author_id, definitely put one on it. Also I'm not sure ORDER BY RAND() is not responsible for the performance drawback. Try adding the index and it should already improve dramatically.

Gergely Orosz
A: 

Especially in a situation where you read your data a lot more than you update your data, be generous when setting up indexing. Anything you'll ever have in a where clause should be indexed.

David Oneill
A: 

A [possibly clustered] index on Author_id will definitively help.

There appears to be an additional risk factor with the ORDER BY RAND() part. Essentially this clause causes SQL to dynamically assign a random number to each row (for a given Author_id), and to order these. This could become a bottleneck as some prolific authors start having hundred and thousands of posts.

mjv