ansaurus

Question

Another way to select count(distinct userID) from a table?

Answer 1

A:

have you tried group by?

for example:

select count(userID), userID
  from UsageLog
 where date between 200901 and 200902
Group by userID

Then do a explain plan on both to compare the performance.

northpole 2009-07-20 17:45:17

This will give a different result than the count of distinct userIDs. It will give you for each userID the count.

jvanderh 2009-07-20 17:58:00

Answer 2

+2 A:

Composite index on Date and UserId should help quite a bit

RC 2009-07-20 17:46:04

Based on his query he would benefit from an index on the Date as well. That would be the real performance gainer.

Nissan Fan 2009-07-20 17:49:23

@Nissan Fan: not really. If is not covering the QO will prefer a clustered scan instead at a surprisingly low 'tipping point'.

Remus Rusanu 2009-07-20 18:23:33

Please elaborate Remus on what you mean. If you have a where clause in SQL Server and you put a covering index in place on the fields in the clause you would stand to gain an incredible amount of performance. No index on Date and you get a full table scan.

Nissan Fan 2009-07-20 18:43:28

'covering' is the key word here, was missing from your original comment.

Remus Rusanu 2009-07-20 21:09:14

Answer 3

+3 A:

Overall I have not found any way that is faster than what you have there, COUNT(DISTINCT UserId) is a pretty basic query.

Your biggest thing here would be to ensure that you have an index on the table that works for the "Date" column and the UserId column

Mitchel Sellers 2009-07-20 17:46:09

Answer 4

+1 A:

use GROUP BY and make sure you have an index on the UserId column

Draemon 2009-07-20 17:46:34

Answer 5

+1 A:

I ran a few quick tests.

One index on Date and UserID: Execution plan shows an index seek but then a sort to perform the distinct which is slow.

One index on UserID and Date: Execution plan shows an index scan and two computes which result in the lower cost of all the scenarios that I ran.

Other scenarios with just Date or just UserID with index are more expensive that the previous one.

jvanderh 2009-07-20 18:08:03

Answer 6

+2 A:

SELECT DISTINCT() is the way to go. The problem is that you are hitting the date index tipping point, so your plan goes for the clustered index scan instead. See the link for Kimberley L. Tripp article what a 'tipping point' is.

You need a covering index:

CREATE INDEX idx_UsageLog_date_user_id ON UsageLog(date) INCLUDE (userID);

Clustered index will also work, but has other side effects as well. If the clustered index on date is OK with the rest of your data access patterns, then is better than the covering index I propose.

Update:

The reverse order index you tried on (userID, date) also works, will range seek each userID. In fact is better than the (date, userID) or (date) INCLUDE (userID) because it returns the userIDs pre-sorted so the DISTINCT does not introduce the additional sort.

Still I recommend going over the link I posted to understand why 'index on each individual columns' was not helping.

Remus Rusanu 2009-07-20 18:08:19

Thanks for the tip. (haha, get it?)

Jeff Meatball Yang 2009-07-20 18:19:08

ansaurus

tags:

views:

answers:

Another way to select count(distinct userID) from a table?

related questions