ansaurus

Question

SQL Count(*) and Group By - Find Difference Between Rows

Answer 1

+1 A:

If you know 624 is the magic number:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> 624
ORDER BY proc_id;

David M 2009-07-22 20:31:15

Answer 2

+2 A:

First you need to define the criteria that makes '624' correct. Is it the average count(*) ? Is it the count(*) that occurs most often? Is it your favorite count(*) ?

Then you can use the HAVING clause to separate the ones that don't match your criteria:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> 624
ORDER BY proc_id;

or:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> (
  <insert here a subquery that produces the magic '624'>
 )
ORDER BY proc_id;

Remus Rusanu 2009-07-22 20:33:38

+1 because of the subquery

Philip Kelley 2009-07-22 21:09:19

Answer 3

A:

try this:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> (select count(*) from proc z where proc_id in (1) group by proc_id)
ORDER BY proc_id;

northpole 2009-07-22 20:33:53

Answer 4

A:

You can't do this. For some procIds there are fewer rows with that ProcId. In other words, the rows that make that procId not have a count = 624 are rows that DO NOT EXIST. How can any query show those rows?

For the ProcIds that have too many rows, IF ( and this is big if), IF all the rows in the 624 for other procIds have some attribute that is in common with a 624 count subset of the sets that are too large, then you might be able to identify the "extra" rows, buit there is no way to identify missing rows, all you can do is identify which procIds have too many rows or too few...

Charles Bretana 2009-07-22 20:43:01

Answer 5

A:

If I understand your question correctly (which is differently than the other posted answers) you want the rows that make proc_id 01 different? If that's the case, you need to join on all the columns that should be the same, and look for the differences. So, to compare 01 with 02:

 SELECT [01].*
 FROM (
    SELECT * FROM proc
    WHERE grouping_primary = 'SLB'
    AND eff_date = '01-JUL-09'
    AND proc_id = '01'
 ) as [01]
 FULL JOIN (
    SELECT * FROM proc
    WHERE grouping_primary = 'SLB'
    AND eff_date = '01-JUL-09'
    AND proc_id = '02'
 ) as [02] ON
    [01].col1 = [02].col1
    AND [01].col2 = [02].col2
    AND [01].col3 = [02].col3
    /* etc...just don't include proc_id */
 WHERE
    [01].proc_id IS NULL --no match in [02]
    OR [02].proc_id IS NULL --no match in [01]

I'm pretty sure MS Sql Server has a row hash function that may make it easier if you have a bunch of columns...but I can't think of the name of it.

Mark Brackett 2009-07-22 20:44:03

except that, as I understand the question, it is not the column values that make it different, but simply the count of rows with that procId... regardless of the column values –

Charles Bretana 2009-07-22 20:57:19

CHECKSUM is the magic row hash function

Mark Brackett 2009-07-24 15:20:20

Answer 6

A:

Well, in order to find the extra you would use the NOT IN phrase. To find the missing rows you would need to reverse the logic. This naturally assumes that all 624 rows are the same from proc_id to proc_id.

SELECT proc_id, varying_column 
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
AND   varying_column NOT IN (SELECT b.varying_column 
                             FROM proc b
                             WHERE b.grouping_primary = 'SLB'
                             AND   b.eff_date = '01-JUL-09'
                             AND   b.proc_id = (SELECT FIRST a.proc_id
                                                FROM proc a
                                                WHERE a.grouping_primary = 'SLB'
                                                AND   a.eff_date = '01-JUL-09'
                                                AND   COUNT(a.*) = 624
                                                GROUP BY a.proc_id
                                                ORDER BY a.proc_id;))
ORDER BY proc_id, varying_column;

2009-07-22 21:12:15

ansaurus

tags:

views:

answers:

SQL Count(*) and Group By - Find Difference Between Rows

related questions