tags:

views:

166

answers:

4

Suppose I've got a table:

Role
----
person_id company_id financial_year

How can I tell the following:

  1. Whether each person_id occurs at most once per company_id per financial_year in this table
  2. If 1. is false, which person_id's and company_id's and financial_year's co-occur more than once

Edit 1: Edited to add financial_year col

Edit 2: the RDBMS platform here happens to be MySQL, though I don't expect that this would require much vendor-specific SQL

A: 

This should do what you need:

select left.person_id, left.company_id, left.financial_year, count(*)
from role left
inner join role right
    on left.person_id = right.person_id 
        and left.company_id = right.company_id
        and left.financial_year = right.financial_year
group by left.person_id, left.company_id, left.financial_year

Note that this is T-SQL (MS), but the only thing I know of that might change is the table alias syntax, as the rest is ANSI SQL. This will return one row per repeated person/company/year combination, with a count of the number of times that combination is repeated (though the count wasn't mentioned in the question, I know it can sometimes be useful).

Harper Shelby
As written, it doesn't sort out financial years. It will also return each entry in role because that row can be joined with itself.
Jonathan Leffler
I see the question was edited to add financial year - maybe you answered before that...
Jonathan Leffler
I did - I'll add that in!
Harper Shelby
+4  A: 

For the first, it's generally a good idea to just have a grouping which you can then filter on if you want:

select
  r.company_id, r.person_id, r.financial_year, count(r.person_id) 
from
  Role as r 
group by
  r.company_id, r.person_id, r.financial_year

For the second, you can just modify the above like so:

select
  r.company_id, r.person_id, r.financial_year, count(r.person_id) 
from
  Role as r 
group by
  r.company_id, r.person_id, r.financial_year
having
  count(r.person_id) > 1
casperOne
The having clause should come after the Group by.
jackrabbit
A: 

I think this will do it for #1:

select count(*), count(distinct person_id, company_id, financial_year)
    from role

(Edit: If the two count()'s are different, then the table contains multiple rows per unique combination of the three columns, what I was asking in question #1. Take the difference of them to get the number of such rows.)

and casperOne's answer will do it for #2

ʞɔıu
It lists everyone; you need a clause to limit the output to duplicates. That might be the HAVING clause.
Jonathan Leffler
You can't put a list of columns into the COUNT() function.
Bill Karwin
Bill K, you can indeed put multiple cols in a count(distinct ...). This query works.
ʞɔıu
Jon L, not sure what you mean exactly. Question 1 was asking whether the table contained any such dupe rows, question 2 was what those rows were if they did exist.
ʞɔıu
A: 

Yes, in general, to detect duplicates,

Select [ColumnList you want to be unique]
From Table
Group By [SameColumn List]
Having Count(*) > 1

In your specific case

Select person_id, company_id, financial_year
From Table
Group By person_id, company_id, financial_year
Having Count(*) > 1

or, for your sub-question (1) about Whether each person_id occurs at most once per company_id per financial_year in this table

Select company_id, financial_year
From Table
Group By company_id, financial_year
Having Count(Person_Id) > 1

and for (2): (when (1) is false, which person_id's and company_id's and financial_year's co-occur more than once

Select person_id, company_id, financial_year
From Table T
Where Not Exists 
     (Select * From Table
      Where company_id = T.company_id 
         And financial_year = T.financial_year          
      Having Count(Person_Id) > 1)
Group By person_id, company_id, financial_year
Having Count(*) > 1
Charles Bretana
(a) The table was (unusually - but the questioner is to be complimented for doing so) given the name of Role. (b) Queries 1 and 2 look good. Your third query is not what I'd expect, and hence leads to query 4 not being what I'd expect either.
Jonathan Leffler
yes, in re-reading, I don't see the distinction between the query listed as (1), and just the count of distinct instances of the entire triplet...
Charles Bretana