views:

217

answers:

5

Here's the query:

SELECT COUNT(*) AS c, MAX(`followers_count`) AS max_fc, 
       MIN(`followers_count`) AS min_fc, MAX(`following_count`) AS max_fgc,
       MIN(`following_count`) AS min_fgc, SUM(`followers_count`) AS fc,
       SUM(`following_count`) AS fgc, MAX(`updates_count`) AS max_uc,
       MIN(`updates_count`) AS min_uc, SUM(`updates_count`) AS uc
FROM `profiles`
WHERE `twitter_id` IN (SELECT `followed_by` 
                       FROM `relations` 
                       WHERE `twitter_id` = 123);

The two tables are profiles and relations. Both have over 1,000,000 rows, InnoDB engine. Both have indexes on twitter_id, relations has an extra index on (twitter_id, followed_by). The query is taking over 6 seconds to execute, this really frustrates me. I know that I can JOIN this somehow, but my MySQL knowledge is not so cool, that's why I'm asking for your help.

Thanks in advance everyone =)

Cheers, K ~

Updated

Okay I managed to get down to 2,5 seconds. I used INNER JOIN and added the three index pairs. Here's the EXPLAIN results:

id, select_type, table, type, possible_keys, 
    key, key_len, ref, rows, Extra

1, 'SIMPLE', 'r', 'ref', 'relation', 
    'relation', '4', 'const', 252310, 'Using index'

1, 'SIMPLE', 'p', 'ref', 'PRIMARY,twiter_id,id_fc,id_fgc,id_uc', 
    'id_uc', '4', 'follerme.r.followed_by', 1, ''

Hope this helps.

Another update

Here are the SHOW CREATE TABLE statements for both tables:

CREATE TABLE `profiles` (
  `twitter_id` int(10) unsigned NOT NULL,
  `screen_name` varchar(45) NOT NULL default '',
  `followers_count` int(10) unsigned default NULL,
  `following_count` int(10) unsigned default NULL,
  `updates_count` int(10) unsigned default NULL,
  `location` varchar(45) default NULL,
  `bio` varchar(160) default NULL,
  `url` varchar(255) default NULL,
  `image` varchar(255) default NULL,
  `registered` int(10) unsigned default NULL,
  `timestamp` int(10) unsigned default NULL,
  `relations_timestamp` int(10) unsigned default NULL,
  PRIMARY KEY  USING BTREE (`twitter_id`,`screen_name`),
  KEY `twiter_id` (`twitter_id`),
  KEY `screen_name` USING BTREE (`screen_name`,`twitter_id`),
  KEY `id_fc` (`twitter_id`,`followers_count`),
  KEY `id_fgc` (`twitter_id`,`following_count`),
  KEY `id_uc` (`twitter_id`,`updates_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

CREATE TABLE `relations` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `twitter_id` int(10) unsigned NOT NULL default '0',
  `followed_by` int(10) unsigned default NULL,
  `timestamp` int(10) unsigned default NULL,
  PRIMARY KEY  USING BTREE (`id`,`twitter_id`),
  UNIQUE KEY `relation` (`twitter_id`,`followed_by`)
) ENGINE=InnoDB AUTO_INCREMENT=1209557 DEFAULT CHARSET=utf8

Wow, what a mess =) Sorry!

+3  A: 

A join would look something like this:

SELECT COUNT(*) AS c,
MAX(p.`followers_count`) AS max_fc,
MIN(p.`followers_count`) AS min_fc,
MAX(p.`following_count`) AS max_fgc,
MIN(p.`following_count`) AS min_fgc,
SUM(p.`followers_count`) AS fc,
SUM(p.`following_count`) AS fgc,
MAX(p.`updates_count`) AS max_uc,
MIN(p.`updates_count`) AS min_uc,
SUM(p.`updates_count`) AS uc
FROM `profiles` AS p
INNER JOIN `relations` AS r ON p.`twitter_id` = r.`followed_by`
WHERE r.`twitter_id` = 123;

To help optimize it you should run EXPLAIN SELECT ... on both queries.

Greg
A join would slow this way down.
corymathews
This decreased the execution time from 6s to 3s
kovshenin
+1  A: 
SELECT COUNT(*) AS c,
  MAX(`followers_count`) AS max_fc, MIN(`followers_count`) AS min_fc,
  MAX(`following_count`) AS max_fgc, MIN(`following_count`) AS min_fgc,
  SUM(`followers_count`) AS fc, SUM(`following_count`) AS fgc,
  MAX(`updates_count`) AS max_uc, MIN(`updates_count`) AS min_uc, SUM(`updates_count`) AS uc
FROM `profiles`
JOIN `relations`
  ON (profiles.twitter_id = relations.followed_by)
WHERE relations.twitted_id = 123;

might be a bit faster, but you'll need to measure and check if that is indeed so.

Alex Martelli
I went with the above method, but I think there's no difference in these two.
kovshenin
Yep, the differences in how/if you break the lines, or table aliases and field qualifications, don't matter (except you'd need to use aliases and/or qualifications if needed to disambiguate, but I see no trace of that here).
Alex Martelli
+1  A: 

Create the following composite indexes:

profiles (twitter_id, followers_count)
profiles (twitter_id, following_count)
profiles (twitter_id, updates_count)

and post the query plan, for God's sake.

By the way, how many rows does this COUNT(*) return?

Update:

Your table rows are quite long. Create a composite index on all the fields you select:

profiles (twitter_id, followers_count, following_count, updates_count)

so that the JOIN query can retrieve all the values it need from that index.

Quassnoi
Thanks! Created the index pairs, id_fc, id_fgc, id_uc. Seems to have spared me ~300 ms. Query plan? You mean the EXPLAIN results? Look at the qestion update.
kovshenin
COUNT(*) returns 195436 rows. Got some errors there, unknown column p.twitter_id in the where clause (for all of them I guess). After I added p to every FROM profiles it said Unknown column 'r.followed_by' in 'on clause'. Not sure what to do next.Thanks so much for your help.
kovshenin
@kovshenin: as for the unknown column, see the post update.
Quassnoi
@kovshenin: as for the error, well, 3 seconds is quite a decent time for 200,000 rows. Could you please post complete `CREATE TABLE` statements for you `profiles` and `relations`?
Quassnoi
Hey, fixed the previous errors. Sorry didn't figure it out myself. Anyways, now I'm stuck with this one: Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause. Shoud I group anything?
kovshenin
Updated my post with the `CREATE TABLE` statements. 3 seconds is decent, but not enough and I know there is a way to go below that and it's only a matter of time before I find out, haha =) Although it may take me two or three months to read that MySQL High Performance book I bought a few weeks ago. =))
kovshenin
@kovshenin: ummm... sorry, I didn't get your logic right. Forget the last query :) Just post your table definitions if you please.
Quassnoi
@kovshenin: see post update.
Quassnoi
Hey, thanks for all your help buddy, I'm down to ~2 seconds =)
kovshenin
+1  A: 

count(*) is a very expensive operation under the InnoDB Engine, have you tried this query without that piece? If it's causing the most processing time then maybe you could keep a running value instead of querying for it each time.

Gandalf
That only really applies when there's no WHERE
Greg
Hmm, hadn't heard that before but guess it makes sense.
Gandalf
Removing COUNT(*) didn't change anything :(
kovshenin
+1  A: 

I'd approach this problem from a programmers angle; I'd have a seperate table (or storage area somewhere) that stored the max,min and sum values associated with each field in your original query and update those values everytime I updated and added a table record. (although deleting may be problematic if not handled correctly).

After the original query to populate these values is complete (which is the almost the same as the query you posted), you're essentially reducing your final query to getting one row from a data table, rather than computing everything all at once.

Phill
same as gandolfs idea above really!
Phill
You mean like having TRIGGERS update those values every time? Hmm, might be a good idea.
kovshenin
Yup, its been a while since I've used mySQL, wern't sure if they had triggers or not, but yes thats the general idea :)
Phill