views:

47

answers:

3

Hi
I have a quite large table storing words contained in email messages

mysql> explain t_message_words;
+----------------+---------+------+-----+---------+----------------+
| Field          | Type    | Null | Key | Default | Extra          |
+----------------+---------+------+-----+---------+----------------+
| mwr_key        | int(11) | NO   | PRI | NULL    | auto_increment |
| mwr_message_id | int(11) | NO   | MUL | NULL    |                |
| mwr_word_id    | int(11) | NO   | MUL | NULL    |                |
| mwr_count      | int(11) | NO   |     | 0       |                |
+----------------+---------+------+-----+---------+----------------+

table contains about 100M rows
mwr_message_id is a FK to messages table
mwr_word_id is a FK to words table
mwr_count is the number of occurrencies of word mwr_word_id in message mwr_message_id

To calculate most used words, I use the following query

SELECT SUM(mwr_count) AS word_count, mwr_word_id
FROM t_message_words
GROUP BY mwr_word_id
ORDER BY word_count DESC
LIMIT 100;

that runs almost forever (more than half an hour on the test server)

mysql> show processlist;
+----+------+----------------+--------+---------+------+----------------------+-----------------------------------------------------
| Id | User | Host           | db     | Command | Time | State                | Info
+----+------+----------------+--------+---------+------+----------------------+-----------------------------------------------------
processlist
| 41 | root | localhost:3148 | tst_db | Query   | 1955 | Copying to tmp table | SELECT SUM(mwr_count) AS word_count, mwr_word_id
    FROM t_message_words
    GROUP BY mwr_word_id |
+----+------+----------------+--------+---------+------+----------------------+-----------------------------------------------------
3 rows in set (0.00 sec)

Is there anything I can do to "speed up" the query (apart from adding more ram, more cpu, faster disks)?

thank you in advance
stefano

P.S. EXPLAIN result:

mysql> EXPLAIN SELECT SUM(mwr_count) AS word_count, mwr_word_id
    -> FROM t_message_words
    -> GROUP BY mwr_word_id
    -> ORDER BY word_count DESC
    -> LIMIT 100;
+----+-------------+-----------------+-------+---------------+----------------------+---------+------+----------+---------------------------------+
| id | select_type | table           | type  | possible_keys | key                  | key_len | ref  | rows     | Extra                           |
+----+-------------+-----------------+-------+---------------+----------------------+---------+------+----------+---------------------------------+
|  1 | SIMPLE      | t_message_words | index | NULL          | IDX_t_message_words2 | 4       | NULL | 94823285 | Using temporary; Using filesort |
+----+-------------+-----------------+-------+---------------+----------------------+---------+------+----------+---------------------------------+
1 row in set (0.01 sec)
A: 

Perhaps an index on mwr_word_id would help. It's now part of a key, but you could add a key purely for that.

You could also increase the cache of mysql in the configuration file.

extraneon
thanks unfortunately mwr_word_id is already indexed :( About cache, could you please tell me what parameter in my.ini should I try to change? I've already modified innodb_buffer_pool_size (from 77MB to 512MB) without any significant change in execution time.
Stefano
@Stefano: "The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size ", see http://dev.mysql.com/doc/refman/5.0/en/internal-temporary-tables.html. I think you increased the wrong setting :)
extraneon
@Stefano It also doesn't hurt to use EXPLAIN query to see what it will do.
extraneon
@extraneon: thank you for point out the page in mysql documentation.I tried to increse both tmp_table_size and max_heap_table_size, but no way :(I updated the question with the EXPLAIN result too
Stefano
@Stefano Would it be acceptable to split the query? Without the order_by to a temporary table explicitely defined in memory, and than the ordered select from that table? Advantage? The rules for handcrafted temp tables are different and perhaps may be larger than automatic temp tables. Plus you can perhaps use MyISAM on the temp table which should be a bit faster for a group_by.
extraneon
yes, it is acceptable and I'm investigating in that direction.I'll let you know the result.thanks
Stefano
+1  A: 

I did noy understand if you have an index by mwr_message_id and mwr_word_id, or just by mwr_word_id, or any other index other than the primary key one. If you don't have an index by mwr_word_id (or having mwr_word_id as first field) I suggest you add one.

If you already have such an index, and this is really a pain for being a very common scenario, I suggest you add some redundancy to words table that sums the total occurrences of mwr_word_id in all mwr_message_id.

And you can also add some triggers to t_message_words to handle the updates of this redundancy.

Fede
yes, I already have both indexed.Indeed I cannot solve this issue with redundancy added to words table, since the "real" query (not shown here for semplicity) contains a Join on mwr_message_id field too.
Stefano
well, then you should show us the real query, so we can suggest more accurate answers :)
Fede
the answer seems to be that the query I posted cannot be speed up :(I could have posted the real query I'm facing, but it would have been difficult to reach an answer.thank you for your suggestion, I'm doing something "similar" to solve the issue.
Stefano
A: 

UPD: If you need to run this query one time - just wait for it completion. If you need to run this query a lot of times - create table with unique words and update counts in this table via trigger every insert/update/delete

Андрей Костенко
sorry Andrii, but I cannot see the reason of such an index.mwr_message_id is not used in the query and mwr_count is not a condition.Why should this improve performances in your opinion?
Stefano
oh, I'm sorry... :-(If you need to run this query one time - just wait for it completion.If you need to run this query a lot of times - create table with unique words and update counts in this table via trigger
Андрей Костенко
I understand your point, thank you
Stefano