views:

96

answers:

3

This query was working fine when the database was small, but now that there are millions of rows in the database, I am realizing I should have looked at optimizing this earlier. It is looking at over 600,000 rows and is Using where; Using temporary; Using filesort (which leads to an execution time of 5-10 seconds). It is using an index on the field 'battle_type.'

SELECT username, SUM( outcome ) AS wins, COUNT( * ) - SUM( outcome ) AS losses
FROM tblBattleHistory
WHERE battle_type =  '0' && outcome <  '2'
GROUP BY username
ORDER BY wins DESC , losses ASC , username ASC 
LIMIT 0 , 50
+5  A: 

It appears you need an index on username, battle_type, outcome or username, outcome, battle_type.

Gilbert Le Blanc
I tried both and then ran EXPLAIN, but it is still just using battle_type as the index. Any idea why that would be? It doesn't show it as a possible key either.
James Simpson
What's the difference in indexing in different orders?
Kerry
Unfortunatelly, both indexes won't be useful here since grouping happens after the `WHERE` conditions.
newtover
@Kerry: Generally, you specify the columns with the most grainularity first, then the columns with lesser grainularity.
Gilbert Le Blanc
@newtover: Yep, you're right. I guess the next index to try would be outcome, battle_type, username.
Gilbert Le Blanc
@Gilbert Le Blanc: miss once again =), see my answer below.
newtover
Newb questoin, what is granularity?
Kerry
@Kerry: Granularity is a measure of how unique a SQL table column is. The id column has a high granularity. Every element is unique. A column with an extremely low granularity would be a gender column. The only values are "M", "F", and null. You want to index columns with high granularity.
Gilbert Le Blanc
@Kerry: A better term is cardinality (http://en.wikipedia.org/wiki/Cardinality)
newtover
Got it -- thank you both!
Kerry
+3  A: 

First thing would be to make sure you have good indexes (as others have mentioned).

However, it looks like you're creating a leaderboard of some sort for a web page. My first question would be - do you really need to execute this query in real time? Could you create a table in your database (or add a wins and losses column to the users table) with the results of this query and simply refresh it periodically?

Eric Petroelje
+3  A: 

Lets see, what you are doing:

  1. Find rows that are of battle_type = 0 and outcome < 2
  2. Order by username for grouping
  3. Compute aggregations and fold rows to distinct usernames
  4. Order by dinamically calculated fields

On steps 3 and 4 you have no influence. Step 2 in its current form can not benefit from any indices, since outcome < 2 is range condition, index on (battle_type, outcome, username) looks very tempting though.

Assuming that outcome is enumeration of 0,1,2,3... you can change the range condition to equality comparison and benefit from index on (battle_type, outcome, username):

SELECT username, SUM( outcome ) AS wins, COUNT( * ) - SUM( outcome ) AS losses
FROM tblBattleHistory
WHERE battle_type = 0 AND outcome IN (0, 1)
GROUP BY username
ORDER BY wins DESC , losses ASC , username ASC 
LIMIT 0 , 50

If outcome is not enumeration, index on (battle_type, outcome) will do. Index on (battle_type) only is excess now, since battle_type is a prefix in the compound index.

newtover
This helped some, but it was still looking at over 500,000 rows, so I'm just going to make a table to cache the results and update every 10 minutes.
James Simpson