ansaurus

Question

Answer 1

A:

Hi, hope this works i am writing on prima vista

SELECT COUNT(*) 
FROM DocumentOutputItems 
GROUP BY DocumentId, DocumentSessionId

Best Regards, Iordan

IordanTanev 2009-09-24 12:10:44

Who voted this up? It doesn't work at all I'm afraid. I want to see one row only as the result.

Novitzky 2009-09-24 12:24:27

In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.

Dave Costa 2009-09-24 13:19:13

Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.

Novitzky 2009-09-24 13:32:19

Answer 2

+1 A:

What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?

It certainly works as you might expect in Oracle.

SQL> select distinct deptno, job from emp
  2  order by deptno, job
  3  /

    DEPTNO JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

9 rows selected.


SQL> select count(*) from (
  2  select distinct deptno, job from emp
  3  )
  4  /

  COUNT(*)
----------
         9

SQL>

edit

I went down a blind alley with analytics but the answer was depressingly obvious...

SQL> select count(distinct concat(deptno,job)) from emp
  2  /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
                                9

SQL>

edit 2

Given the following data the concatenating solution provided above will miscount:

col1  col2
----  ----
A     AA
AA    A

So we to include a separator...

select deptno + '*' + job from emp
/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

APC 2009-09-24 12:41:18

+1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)

Novitzky 2009-09-24 13:57:33

Answer 3

+1 A:

How about something like:

select count(*)
from
  (select count(*) cnt
   from DocumentOutputItems
   group by DocumentId, DocumentSessionId) t1

Probably just does the same as you are already though but it avoids the DISTINCT.

Trevor Tippins 2009-09-24 12:46:34

Yes, you right. It does the same job as my original one.

Novitzky 2009-09-24 13:35:28

in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost

KM 2009-09-24 13:43:42

+1 for a nice try and the explanation.

Novitzky 2009-09-24 13:50:04

Answer 4

A:

A shorter version without the subselect is this:

SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems

It works fine in mysql, and i think the optimizer have an easier time to understand this one.

Edit: apparantly i missread mssql and mysql, sorry about that, but maybe it helps anyway.

Alexander Kjäll 2009-09-24 13:33:21

in SQL Server you get: __Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.__

KM 2009-09-24 13:35:45

This is what I was thinking of. I want do similar thing in MSSQL if possible.

Novitzky 2009-09-24 13:38:22

@Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.

KM 2009-09-24 13:45:37

Answer 5

+1 A:

if you had only one field to "DISTINCT", you could use:

SELECT COUNT(DISTINCT DocumentId) 
FROM DocumentOutputItems

and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:

    SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
    FROM DocumentOutputItems

but you'll have issues if NULLs are involved. I'd just stick with the original query.

KM 2009-09-24 13:34:03

+1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.

Novitzky 2009-09-24 13:58:09

Answer 6

+1 A:

There's nothing wrong with your query, but you could also do it this way:

WITH internalQuery (Amount)
AS
(
    SELECT (0)
      FROM DocumentOutputItems
  GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
  FROM internalQuery

Bliek 2009-09-24 13:37:10

Answer 7

+1 A:

if you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to you query

JasonHorner 2009-09-26 03:42:34

ansaurus

tags:

views:

answers:

Counting DISTINCT over multiple columns

related questions