tags:

views:

2936

answers:

6

Below is a SQL query I wrote to find the total number of rows by each Product ID (proc_id):

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
ORDER BY proc_id;

Below is the result of the SQL query above:

proc_id count(*)
01  626
02  624
03  626
04  624
05  622
06  624
07  624
09  624

Notice the total counts by proc_id = '01', proc_id = '03', and proc_id = '05' are different (not equal to 624 rows as the other proc_id).

How do I write a SQL query to find which proc_id rows are different for proc_id = '01', proc_id = '03', and proc_id = '05' as compared to the other proc_id?

+1  A: 

If you know 624 is the magic number:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> 624
ORDER BY proc_id;
David M
+2  A: 

First you need to define the criteria that makes '624' correct. Is it the average count(*) ? Is it the count(*) that occurs most often? Is it your favorite count(*) ?

Then you can use the HAVING clause to separate the ones that don't match your criteria:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> 624
ORDER BY proc_id;

or:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> (
  <insert here a subquery that produces the magic '624'>
 )
ORDER BY proc_id;
Remus Rusanu
+1 because of the subquery
Philip Kelley
A: 

try this:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> (select count(*) from proc z where proc_id in (1) group by proc_id)
ORDER BY proc_id;
northpole
A: 

You can't do this. For some procIds there are fewer rows with that ProcId. In other words, the rows that make that procId not have a count = 624 are rows that DO NOT EXIST. How can any query show those rows?

For the ProcIds that have too many rows, IF ( and this is big if), IF all the rows in the 624 for other procIds have some attribute that is in common with a 624 count subset of the sets that are too large, then you might be able to identify the "extra" rows, buit there is no way to identify missing rows, all you can do is identify which procIds have too many rows or too few...

Charles Bretana
A: 

If I understand your question correctly (which is differently than the other posted answers) you want the rows that make proc_id 01 different? If that's the case, you need to join on all the columns that should be the same, and look for the differences. So, to compare 01 with 02:

 SELECT [01].*
 FROM (
    SELECT * FROM proc
    WHERE grouping_primary = 'SLB'
    AND eff_date = '01-JUL-09'
    AND proc_id = '01'
 ) as [01]
 FULL JOIN (
    SELECT * FROM proc
    WHERE grouping_primary = 'SLB'
    AND eff_date = '01-JUL-09'
    AND proc_id = '02'
 ) as [02] ON
    [01].col1 = [02].col1
    AND [01].col2 = [02].col2
    AND [01].col3 = [02].col3
    /* etc...just don't include proc_id */
 WHERE
    [01].proc_id IS NULL --no match in [02]
    OR [02].proc_id IS NULL --no match in [01]

I'm pretty sure MS Sql Server has a row hash function that may make it easier if you have a bunch of columns...but I can't think of the name of it.

Mark Brackett
except that, as I understand the question, it is not the column values that make it different, but simply the count of rows with that procId... regardless of the column values –
Charles Bretana
CHECKSUM is the magic row hash function
Mark Brackett
A: 

Well, in order to find the extra you would use the NOT IN phrase. To find the missing rows you would need to reverse the logic. This naturally assumes that all 624 rows are the same from proc_id to proc_id.

SELECT proc_id, varying_column 
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
AND   varying_column NOT IN (SELECT b.varying_column 
                             FROM proc b
                             WHERE b.grouping_primary = 'SLB'
                             AND   b.eff_date = '01-JUL-09'
                             AND   b.proc_id = (SELECT FIRST a.proc_id
                                                FROM proc a
                                                WHERE a.grouping_primary = 'SLB'
                                                AND   a.eff_date = '01-JUL-09'
                                                AND   COUNT(a.*) = 624
                                                GROUP BY a.proc_id
                                                ORDER BY a.proc_id;))
ORDER BY proc_id, varying_column;