views:

228

answers:

3

Howdy peeps.

I hope you can help me with this one. I got a twitter service ( http://foller.me ) which is about to be updated to public beta 2, but I don't want to do this until I could give the pages out in at least 2 or 3 seconds. The current version is simple enough, but the dev version I'm working on is quite complex.

It's all about my relations and profiles tables which both contain ~ 2 million rows. The profile table contains general twitter user info and the relations table contains [twitter_id, followed_by] entries which are both in a UNIQUE index. There's a cron job that runs every once in a while and asks the Twitter API about the users followers and inserts data into the relations database.

I'm using InnoDB, so the table isn't being locked, but updated frequently, thus the query cache isn't getting enough hits, especially when the cron job is executing.

I've got statements like SELECT screen_name FROM profiles WHERE twitter_id IN (SELECT followed_by FROM relations WHERE twitter_id = 'kovshenin'). I'm sure that's not a very good way to do it and these are the ones (plus lots of others) that come up in the mysql-slow log when I turn it on.

Anyways, I need some general tips on how to acomplish good runtime for my project.

Thanks so much.

+3  A: 

Try to rewrite your query as follows:

SELECT  screen_name
FROM    profiles p
WHERE   EXISTS
        (
        SELECT  1
        FROM    relations r
        WHERE   r.twitter_id = 'kovshenin'
                AND r.followed_by = p.twitter_id
        )

Also, since you have a UNIQUE index, you will most probably benefit from rewriting your query as a JOIN:

SELECT  p.screen_name
FROM    relations r
JOIN    profiles p
ON      p.twitter_id = r.followed_by
WHERE   r.twitter_id = 'kovshenin'

You may also benefit from creating several indexes with frequently used columns at the expense of extra DML overhead.

For instance, this query:

SELECT  p.screen_name
FROM    relations r
JOIN    profiles p
ON      p.twitter_id = r.followed_by
WHERE   r.twitter_id = 'kovshenin'

will not even look into profiles, if you create a composite index on profiles (twitter_id, screen_name).

If selecting the screen_name (and only the screen_name, i. e. it's always the only column in the SELECT list) is a query common enough for your application, you should consider creating such an index.

Quassnoi
+1  A: 

Also be sure to check queries in the slow query log using EXPLAIN on the command line

EXPLAIN SELECT * FROM PROFILES

This will show you what's slowing the query down (no index, no joins etc) and help debugging the slower ones. You can also check out this answer for information on the MySQL profiler to get really in depth timings for table locks, query cache checks/writes etc

Andy