views:

1159

answers:

7

Is any better way of doing a query like this:

SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
      FROM DocumentOutputItems) AS internalQuery

I need to count the number of distinct items from this table but the distinct is over two columns. I hope that makes sense.

A: 

Hi, hope this works i am writing on prima vista

SELECT COUNT(*) 
FROM DocumentOutputItems 
GROUP BY DocumentId, DocumentSessionId

Best Regards, Iordan

IordanTanev
Who voted this up? It doesn't work at all I'm afraid. I want to see one row only as the result.
Novitzky
In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.
Dave Costa
Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.
Novitzky
+1  A: 

What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?

It certainly works as you might expect in Oracle.

SQL> select distinct deptno, job from emp
  2  order by deptno, job
  3  /

    DEPTNO JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

9 rows selected.


SQL> select count(*) from (
  2  select distinct deptno, job from emp
  3  )
  4  /

  COUNT(*)
----------
         9

SQL>

edit

I went down a blind alley with analytics but the answer was depressingly obvious...

SQL> select count(distinct concat(deptno,job)) from emp
  2  /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
                                9

SQL>

edit 2

Given the following data the concatenating solution provided above will miscount:

col1  col2
----  ----
A     AA
AA    A

So we to include a separator...

select deptno + '*' + job from emp
/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

APC
+1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)
Novitzky
+1  A: 

How about something like:

select count(*)
from
  (select count(*) cnt
   from DocumentOutputItems
   group by DocumentId, DocumentSessionId) t1

Probably just does the same as you are already though but it avoids the DISTINCT.

Trevor Tippins
Yes, you right. It does the same job as my original one.
Novitzky
in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost
KM
+1 for a nice try and the explanation.
Novitzky
A: 

A shorter version without the subselect is this:

SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems

It works fine in mysql, and i think the optimizer have an easier time to understand this one.

Edit: apparantly i missread mssql and mysql, sorry about that, but maybe it helps anyway.

Alexander Kjäll
in SQL Server you get: __Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.__
KM
This is what I was thinking of. I want do similar thing in MSSQL if possible.
Novitzky
@Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.
KM
+1  A: 

if you had only one field to "DISTINCT", you could use:

SELECT COUNT(DISTINCT DocumentId) 
FROM DocumentOutputItems

and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:

    SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
    FROM DocumentOutputItems

but you'll have issues if NULLs are involved. I'd just stick with the original query.

KM
+1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.
Novitzky
+1  A: 

There's nothing wrong with your query, but you could also do it this way:

WITH internalQuery (Amount)
AS
(
    SELECT (0)
      FROM DocumentOutputItems
  GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
  FROM internalQuery
Bliek
+1  A: 

if you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to you query

JasonHorner