ansaurus

Question

How to select only one full row per group in a "group by" query?

Answer 1

+4 A:

It concerns me that you want any old value for fields b and c. If they are to be meaningless why are you returning them?

If it truly doesn't matter (and I honestly can't imagine a case where I would ever want this, but it's what you said) and the values for b and c don't even have to be from the same record, group by with the use of mon or max is the way to go. It's more complicated if you want the values for a particular record for all fields.

select A, count(A) as CountDuplicates, min(B) as B , min(C) as C
from TableName as base 
group by A 
having (count(A) > 1)

HLGEM 2010-06-21 18:43:14

Ok, it may work. By the way, what I wanted to say by meaningless is that it doesn't matter inside the same group of rows. I use this data just to have *a hint* of *"what is duplicated and how much"*.

MainMa 2010-06-21 19:00:42

This may be incorrect. The B and C returned are potentially unrelated -- they may come from different records. You haven't returned an arbitrary record representing one of the A's, but fragments of two different A's.

Chris Wuestefeld 2010-06-21 21:38:57

ANd I stated in the answer that it would do so. The poster himself said the values didn't matter.

HLGEM 2010-06-21 21:40:24

Answer 2

A:

you can do some thing like this if you have id as primary key in your table

select id,b,c from tablename 
inner join
(
select id, count(A) as CountDuplicates
from TableName as base group by A,id having (count(A) > 1) 
)d on tablename.id= d.id

Pranay Rana 2010-06-21 18:43:18

Downvote - I think you're guilty of shot-gunning an answer just to be first. Obviously you didn't test this code, since it says "form" and "innet join". Also, the fact that it relies on A being a unique key (but not primary key as you said) makes it a bad general solution.

Chris Wuestefeld 2010-06-21 21:51:49

ans updated now... thanks for the info

Pranay Rana 2010-06-22 05:01:25

Answer 3

+1 A:

The ROW_NUMBER function in a CTE is the way to do this. For example:

DECLARE @mytab TABLE (A INT, B INT, C INT)
INSERT INTO @mytab ( A, B, C ) VALUES (1, 1, 1)
INSERT INTO @mytab ( A, B, C ) VALUES (1, 1, 2)
INSERT INTO @mytab ( A, B, C ) VALUES (1, 2, 1)
INSERT INTO @mytab ( A, B, C ) VALUES (1, 3, 1)
INSERT INTO @mytab ( A, B, C ) VALUES (2, 2, 2)
INSERT INTO @mytab ( A, B, C ) VALUES (3, 3, 1)
INSERT INTO @mytab ( A, B, C ) VALUES (3, 3, 2)
INSERT INTO @mytab ( A, B, C ) VALUES (3, 3, 3)
;WITH numbered AS 
(
    SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY A ORDER BY B, C)
        FROM @mytab AS m
)
SELECT *
    FROM numbered
    WHERE rn=1

As I mentioned in my comment to HLGEM and Philip Kelley, their simple use of an aggregate function does not necessarily return one "solid" record for each A group; instead, it may return column values from many separate rows, all stitched together as if they were a single record. For example, if this were a PERSON table, with the PersonID being the "A" column, and distinct contact records (say, Home and Word), you might wind up returning the person's home city, but their office ZIP code -- and that's clearly asking for trouble.

The use of the ROW_NUMBER, in conjunction with a CTE here, is a little difficult to get used to at first because the syntax is awkward. But it's becoming a pretty common pattern, so it's good to get to know it.

In my sample I've define a CTE that tacks on an extra column rn (standing for "row number") to the table, that itself groups by the A column. A SELECT on that result, filtering to only those having a row number of 1 (i.e., the first record found for that value of A), returns a "solid" record for each A group -- in my example above, you'd be certain to get either the Work or Home address, but not elements of both mixed together.

Chris Wuestefeld 2010-06-21 21:49:16

ansaurus

tags:

views:

answers:

How to select only one full row per group in a "group by" query?

related questions