views:

78

answers:

2

The goal of this query is to get a total of unique records (most recent per IP, by IP) per ref ID.

SELECT COUNT(DISTINCT ip), GROUP_CONCAT(ref.id)
FROM `sess` sess
JOIN `ref` USING(row_id)
WHERE sess.time BETWEEN '2010-04-21 00:00:00' AND '2010-04-21 23:59:59'
GROUP BY ref.id
ORDER BY sess.time DESC

The query works fine, but its using a temporary table. Any ideas?

The row_id is the primary on both tables. sess.time, sess.ip, and ref.id are all indexes.

A: 

It's probably using a temporary table because of the GROUP_CONCAT. But is that a problem really? Is the query too slow or do you simply dislike temporary tables?

WoLpH
I try to avoid temporary tables when working with so many rows (50k+ per day). I will need to pull this across days/months/years at times. I guess maybe its time to create another summary table?
jwzk
If you need this for longer periods with more data than I would definately recommend creating summary table(s). I don't see a way to get this query itself much simpler/faster in MySQL. Perhaps if you order by `ref.id` that it would be able to omit the temporary table but I doubt it.
WoLpH
A: 

I'm having trouble understanding how this query makes sense. Why do you use GROUP_CONCAT(ref.id) if you have GROUP BY ref.id? There can be only one value for ref.id per group by definition.

Also you ORDER BY sess.time even though sess could have multiple values for time per group. Which row in the group do you want to use for sorting?

I agree that a query that invokes a temporary table usually has a performance issue in MySQL. The temporary table often writes to disk, so you get an expensive disk I/O as part of the grouping & sorting.

Could you edit your question and show the table defintions (SHOW CREATE TABLE would be best)? Also please describe what the query is supposed to represent. Then we will have a better chance of giving you some suggestions about how to rewrite it.

Bill Karwin