ansaurus

Question

Select most frequently occurring records using two or more grouping columns

Answer 1

A:

You'll have to clean this up if you are better with subqueries than I am, but in my testing this produced the result you want:

SELECT
  main.Department as Department,
  (SELECT 
     Category
   FROM yourtable
   WHERE Department=main.Department
   GROUP BY Category
   ORDER BY COUNT(Category) DESC
   LIMIT 1) AS Category
FROM yourtable main
GROUP BY main.Department

The trick is just to get the one row in the subquery to return the max value you want with the ORDER BY and the "LIMIT 1"

Renesis 2009-07-30 16:57:09

This syntax doesn't work with Oracle.

APC 2009-08-05 16:39:34

Answer 2

+3 A:

Works in both Oracle and SQL Server, I believe is all standard SQL, from later standards:

with T_with_RN as
    (select Department
     , Category
     , row_number() over (partition by Department order by count(*) Desc) as RN
    from T
    group by Department, Category)
select Department, Category
from T_with_RN
where RN = 1

EDIT I don't know why I used the WITH, the solution is probably easier to read using an inline view:

select Department, Category
from (select Department
    , Category
    , row_number() over (partition by Department order by count(*) Desc) as RN
    from T
    group by Department, Category) T_with_RN
where RN = 1

END EDIT

Test cases:

create table T (
    Department varchar(10) null,
    Category varchar(10) null
);

-- Original test case
insert into T values ('0001', 'A');
insert into T values ('0002', 'D');
insert into T values ('0003', 'A');
insert into T values ('0003', 'A');
insert into T values ('0003', 'C');
insert into T values ('0004', 'B');
-- Null Test cases:
insert into T values (null, 'A');
insert into T values (null, 'B');
insert into T values (null, 'B');
insert into T values ('0005', null);
insert into T values ('0005', null);
insert into T values ('0005', 'X');
-- Tie Test case
insert into T values ('0006', 'O');
insert into T values ('0006', 'P');

Shannon Severance 2009-07-30 16:59:40

The original question doesn't mention how to handle ties. Your solution selects one of the tied rows, effectively at random. I think it would be better to use RANK() instead of ROW_NUMBER(), as that returns all rows which tie for first place.

APC 2009-08-05 16:42:04

@APC: He does mention ties: "In the case of a tie, I'd just select the min/max of the category arbitrarily." Which I took to mean he didn't care which of the tied categories was chosen. You are correct that rank() would return ties.

Shannon Severance 2009-08-05 18:24:50

ansaurus

tags:

views:

answers:

Select most frequently occurring records using two or more grouping columns

related questions