views:

238

answers:

7

I am using the following MySQL query in a PHP script on a database that contains over 370,000,000 (yes, three hundred and seventy million) rows. I know that it is extremely resource intensive and it takes ages to run this one query. Does anyone know how I can either optimise the query or get the information in another way that's quicker?

Table Info:

games | longint, unsigned, Primary Key
win   | bit(1)
loss  | bit(1)

Query:

SELECT MID(game,{$len},1) AS move,
       COUNT(*) AS games,
       SUM(win) AS wins,
       SUM(loss) AS losses
FROM games
WHERE game>{$something} AND game<{$something_else}
GROUP BY move

Thanks in advance for your help!

+5  A: 

The only suggestion I can make is to use a table to precalculate all counts, and sums for each game and update it when table game changes using a trigger.

slipbull
That would require hundreds of millions more rows than are currently used and more space per row too. The table's already almost 10GiB as it is, so I don't think that's viable.
PhantomCode
But your main problem is that the calculations are too costly. Using some more disk space to build a 'cache' table may help.
slipbull
OK, I think I'll look into building some sort of cache system. I'm still worried by the space it'd take up, though.
PhantomCode
A: 

I would try starting with the EXPLAIN query or profiling.

Sam Brightman
A: 

If you're read heavy, consider keep and maintain an aggregated table over the data you commonly query.

nos
Unfortunately I use all different vales for `$game` with approximately equal frequency.
PhantomCode
That just means you aggregate upon the different values of $games that you will need. even if that ranges in the millions, it'll likely be a lot less than the individual records.
nos
A: 

Sounds to me like you could denormalize this and create a "moves" table, recording stats per "move", not only per "game".

Sorin Mocanu
A: 

You can "buy speed" by sacrificing storage space or onserve storage space but got worse performance. As your problem is speed, you'll need some precalculations. And yes, some profiling of the query.

BTW, the "big ones" used to have different configs (different hardware and settings) for OLTP (serving the actual transactions in real time) and DW (analysing large amounts of data).

Csaba Kétszeri
A: 

The mid() function is what is killing this query. MySQL has to create a temp table in memory to deal with the mid() function and do filesort on that table because of the group by.

I'm assuming that $game is the type of game. (checkers, chess, tic tac toe)

I would hang another table off for the type of game. This allows your group to make use of an index which would be much faster.

I suggest something like:

[game]
game bigint unsigned
win bit
loss bit
game_type_id bigint unsigned

[game_type]
game_type_id bigint unsigned
game_type_desc varchar(13)

Be careful with alters statements on a table this large. Always make a backup before you issue an alter.

txyoji
Actually, `$game` is a 15 digit long quinary number. :PAlso, as it says in the question, MID() has to be able to accept 5 different length values so that would be one HUGE extra table!
PhantomCode
Its creating part of the huge extra table in memory every time the query is run and not giving you the benefit of an index.
txyoji
+1  A: 

Straight off the bat, I would stop using the MID() query in both the SELECT expression and GROUP BY. Depening on your query conditions, MySQL will not necessarily cache that inside a single expression whilst parsing, so at least try this:

SELECT MID(game,{$len},1) AS move,
   COUNT(*) AS games,
   SUM(win) AS wins,
   SUM(loss) AS losses
   FROM games WHERE game LIKE '{$game}%' GROUP BY move;

Not the biggest change in the world, but it should make a small difference. Other than that though, I second that the only real way to optimise this short of changing the way you store the data is to precalculate these values and increment them when a game finishes.

Nathan Kleyn
Oh, great thanks. I didn't realise I could `GROUP BY` a collumn i just made up. Also, I should note that I've changed the `LIKE` to a `> x AND < y` statement.
PhantomCode
You query should run a lot faster with the GROUP BY change, and >'s and <'s instead of LIKE. LIKE uses pattern matching and regular expressions to compute matches, whereas > and < are both boolean operators and be computed using direct assembly code.
Nathan Kleyn