views:

673

answers:

6

I have the following SQL query:

select expr1, operator, expr2, count(*) as c 
from log_keyword_fulltext 
group by expr1, operator, expr2 
order by c desc limit 2000;

Problem: The count(*) as part of my order by is killing my application, probably because it don't use index. I would like to know if there is any way to make it faster, like for example a select inside of another select, or something like that.

My SELECT explained:

+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| id | select_type | table                | type  | possible_keys | key   | key_len | ref  | rows   | Extra                                        |
+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | log_keyword_fulltext | index | NULL          | expr1 | 208     | NULL | 110000 | Using index; Using temporary; Using filesort | 
+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+

UPDATED:

I tried to do a subquery like that

select * from (select b.expr1,b.operator,b.expr2,count(*) as c 
from log_keyword_fulltext b group by b.expr1,b.operator,b.expr2) x 
order by x.c desc limit 2000;

its working but not faster, following is the explain:

+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+
| id | select_type | table      | type  | possible_keys | key   | key_len | ref  | rows   | Extra          |
+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL  | NULL    | NULL |  38398 | Using filesort | 
|  2 | DERIVED     | b          | index | NULL          | expr1 | 208     | NULL | 110000 | Using index    | 
+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+

You can check that now, its not using temporary anymore, but it still with the same performance. any recommendation ?

A: 

Trying to count and sort by it is going to be a killer. I would suggest trying to make a temporary table with the counts, and then do a select...order by on that.

Not sure if this works in MySQL, but in PostreSQL or Oracle, that would be

create foo as 
   select expr1, operator, expr2, count(*) as c
   from log_keyword_fulltext 
   group by expr1, operator, expr2;
select * from foo order by c desc limit 2000;

Also, you're going to have to do all the counts in order to sort them, so the limit clause isn't going to prevent it from doing all those calculations.

Paul Tomblin
That's already what MySQL does automagically in this case.
niXar
@Nixar, that would explain the "using temporary" in the query plan? Ok, never mind.
Paul Tomblin
+1  A: 

Always try taking a count of some single column instead of taking count(*) as it takes a count in permutaiotion of each column of each row. So it takes longer time

Eg:

select expr1, operator, expr2, count(expr1) as c 
from log_keyword_fulltext 
group by expr1, operator, expr2 
order by c desc limit 2000;
Samiksha
? That doesn't make any sense at all; if it's a single table then the number of rows is the same whether you use (*) or (expr1) and the query translator/optimizer is going to produce the same action no matter which of the two you choose.
Jason S
its dont make sense because, if expr1 accept null it wont represent all records
VP
It's documented as part of the SQL standard that "COUNT(*)" calculates without reference to any fields.
le dorfier
A: 

What do you mean by "killing your application"? What is the context? How often do you run this query? What is going on on the database while you're running this query? Does this particular result have to be real-time? What are the conditions (inserts / s, selects / s, db size, etc)

Here's what you could do:

  1. Store the count in a separate table, which you'd update with triggers on insert/delete

  2. If you can't coerce MySQL into doing this with a simple table swipe with some magic, try a stored procedure to do something like (pseudo code):

    CREATE TEMP TABLE t (e1 EXP_T, op OP_T, e2 EXP_T, count INTEGER)
    ADD AN INDEX ON count
    FOR EACH LINE OF SELECT exp1,operator,exp2 FROM log_blah DO
           UPDATE t SET count=count+1 WHERE exp1=e1 AND operator=op AND exp2=e2
           IF IT DOES NOT WORK INSERT INTO t VALUES (exp1,operator,exp2,1)
    DONE
    SELECT * FROM t ORDER BY count DESC LIMIT 2000
    

1 is probably what you want. And forget indices, this query has to swipe the whole table anyway.

niXar
it is a daily cron. It hangs with more then 200000 records.I will check your tip!
VP
Isn't your "2" pretty much exactly what I said, and what you voted me down for?
Paul Tomblin
BTW: I'm not challenging you, I just want to know if I didn't make myself clear.
Paul Tomblin
i didnt vote you down...
VP
Well, I was assuming it was @niXar who voted it down, but that's not really important compared to whether his "2" is any different from "create foo as select expr1, operator, expr2, count(*) as c from log_keyword_fulltext group by expr1, operator, expr2"
Paul Tomblin
Did I vote you down? I can't vote down, AFAIK
niXar
+2  A: 

You are running a query which needs to scan the whole of a table, this does not scale. There is no WHERE clause, so it absolutely needs to scan the whole thing.

Consider maintaining some summary table(s) instead of doing this query often.

MarkR
A: 

The best way to prevent a table scan would be to add a cover index for those fields you regularly access. There is a one-time cost to create the index. There is also some additional cost for INSERT and DELETE operations on the table so the index can be updated.

Cover indexes prevent the database from having to read the entire record into memory in order to access the values for the few fields you care about. The entire query can be run off the index.

ALTER TABLE log_keyword_fulltext ADD INDEX idx_name(expr1, operator, expr2)

If these are not actual fields, but rather operations on the fields, such as left(foo,20), you can actually index the part of the field that you will use in future SELECT or WHERE clauses.

See this page for other optimization ideas: http://dev.mysql.com/doc/refman/5.0/en/query-speed.html

Chris
hi, i do it already. thats my indexes.my problem is that my table in production has 12.277.737 records.Regards,Victor
VP
+1  A: 

What am I missing? I don't see a WHERE clause. It looks to me you're requesting a table scan.

If you are counting on your "LIMIT" clause, you're out of luck - that's the COUNT aggregate calculation.

le dorfier