views:

119

answers:

5

I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:

Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribute their salaries on a monthly basis.

Month   Employee  PercentOfTotal
--------------------------------
1       Alice     25%
1       Barbara   65%
1       Claire    10%

2       Alice     25%
2       Barbara   50%
2       Claire    25%

3       Alice     25%
3       Barbara   65%
3       Claire    10%

As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.

What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.

I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)

ID      Employee  PercentOfTotal
--------------------------------
X       Alice     25%
X       Barbara   65%
X       Claire    10%

Y       Alice     25%
Y       Barbara   50%
Y       Claire    25%

Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.

WITH temp_ids (Month)
AS
(
  SELECT DISTINCT MIN(Month)
    FROM employees_paid
  GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid EMP
         JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal

Thanks y'all! -Ricky

+4  A: 

This gives you an answer in a slightly different format than you requested:

SELECT DISTINCT
    T1.PercentOfTotal AS Alice,
    T2.PercentOfTotal AS Barbara,
    T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2
  ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara'
JOIN employees_paid T3
  ON T2.Month = T3.Month AND T3.Employee = 'Claire'

Result:

Alice   Barbara  Claire
25%     50%      25%
25%     65%      10%

If you want to, you can use UNPIVOT to turn this result set into the form you asked for.

SELECT rn AS ID, Employee, PercentOfTotal
FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Alice) AS rn
    FROM (
        SELECT DISTINCT
            T1.PercentOfTotal AS Alice,
            T2.PercentOfTotal AS Barbara,
            T3.PercentOfTotal AS Claire
        FROM employees_paid T1
        JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice'
                                                      AND T2.Employee = 'Barbara'
        JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire'
    ) T1
) p UNPIVOT (PercentOfTotal FOR Employee IN (Alice, Barbara, Claire)) AS unpvt

Result:

ID  Employee  PercentOfTotal  
1   Alice     25%
1   Barbara   50%      
1   Claire    25%             
2   Alice     25%             
2   Barbara   65%              
2   Claire    10%               
Mark Byers
Thanks for the UNPIVOT suggestion -- something I hadn't used before.
+2  A: 

If I have understood you correctly then, for a general solution, I think you would need to concatenate the whole group together - e.g. to produce Alice:0.25, Barbara:0.50, Claire:0.25. Then select the distinct groups so something like the following would do it (rather clunkily).

WITH EmpSalaries
AS
(

SELECT 1 AS Month, 'Alice' AS Employee, 0.25 AS PercentOfTotal UNION ALL
SELECT 1 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL
SELECT 1 AS Month, 'Claire' AS Employee, 0.10 UNION ALL

SELECT 2 AS Month, 'Alice' AS Employee, 0.25 UNION ALL
SELECT 2 AS Month, 'Barbara' AS Employee, 0.50 UNION ALL
SELECT 2 AS Month, 'Claire' AS Employee, 0.25 UNION ALL

SELECT 3 AS Month,  'Alice' AS Employee, 0.25 UNION ALL
SELECT 3 AS Month,  'Barbara' AS Employee, 0.65 UNION ALL
SELECT 3 AS Month,  'Claire' AS Employee, 0.10 
),
Months AS 
(
SELECT DISTINCT Month FROM EmpSalaries
),
MonthlySummary AS
(
SELECT Month,
Stuff(
            (
            Select ', ' + S1.Employee + ':' + cast(PercentOfTotal as varchar(20))
            From EmpSalaries As S1
            Where S1.Month = Months.Month
            Order By S1.Employee
            For Xml Path('')
            ), 1, 2, '') As Summary
FROM Months
)
SELECT * FROM EmpSalaries
WHERE Month IN (SELECT MIN(Month)
                FROM MonthlySummary
                GROUP BY Summary)
Martin Smith
Right on -- this is similar to how my client currently pulls these numbers in their system, parsing the string afterward. I'm in the midst of moving their old data to our new system, which normalizes this, eliminating the need. I figured there might be a "simple" solution that returned the table values -- looks like it's not as much a common scenario as I thought!
+3  A: 

What you want is for each month's distribution to act as a signature or pattern of values which you would then want to find in other months. What is not clear is whether the employee to whom the value went is as important as the break down of percentages. For example, would Alice=65%, Barbara=25%, Claire=10% be the same as the Month 3 in your example? In my example, I presumed that it would not be the same. Similar to Martin Smith's solution, I find the signatures by multiplying each percentage by 10. This presumes that all percentage values are less than one. If someone could have a percentage of 110% for example, that would create problems for this solution.

With Employees As
    (
    Select 1 As Month, 'Alice' As Employee, .25 As PercentOfTotal
    Union All Select 1, 'Barbara', .65
    Union All Select 1, 'Claire', .10
    Union All Select 2, 'Alice', .25
    Union All Select 2, 'Barbara', .50
    Union All Select 2, 'Claire', .25
    Union All Select 3, 'Alice', .25
    Union All Select 3, 'Barbara', .65
    Union All Select 3, 'Claire', .10
    )
    , EmployeeRanks As
    (
    Select Month, Employee, PercentOfTotal
        , Row_Number() Over ( Partition By Month Order By Employee, PercentOfTotal ) As ItemRank
    From Employees
    )
    , Signatures As
    (
    Select Month
        , Sum( PercentOfTotal * Cast( Power( 10, ItemRank ) As bigint) ) As SignatureValue
    From EmployeeRanks
    Group By Month
    )
    , DistinctSignatures As
    (
    Select Min(Month) As MinMonth, SignatureValue
    From Signatures
    Group By SignatureValue
    )
Select E.Month, E.Employee, E.PercentOfTotal
From Employees As E
    Join DistinctSignatures As D
        On D.MinMonth = E.Month
Thomas
Thanks a lot -- I think this one works in the most general sense of all the answers. For my purposes, Month 1 and 3 are the same. In the end, I don't need to know which month(s) each of the distributions came from, just that there are 2 distinct distributions, and what those distributions are.
+2  A: 

I'm assuming performance won't be great (cause of the subquery)

SELECT * FROM employees_paid where Month not in (
     SELECT
          a.Month
     FROM
          employees_paid a
          INNER JOIN employees_paid b ON 
               (a.employee = B.employee AND 
               a.PercentOfTotal = b.PercentOfTotal AND 
               a.Month > b.Month)
     GROUP BY
          a.Month,
          b.Month
     HAVING
          Count(*) = (SELECT COUNT(*) FROM employees_paid c 
               where c.Month = a.Month)
     )
  1. The inner SELECT does a self join to identify matching employee and percentage combinations (except those for the same month). The > in the JOIN ensures that only one set of matches is taken i.e. if a Month1 entry = Month3 entry, we get only the Month3-Month1 entry combination instead of Month1-Month3, Month3-Month1 and Month3-Month3.
  2. We then GROUP by COUNT of matched entries for each month-month combination
  3. Then the HAVING excludes months that don't have as many matches as there are month entries
  4. The outer SELECT gets all entries except the ones returned by the inner query (the ones with full set matches)
potatopeelings
Hey thanks -- elegant, works in a general sense, and great explanation. Performance isn't as much of a concern for me, as it's a one-time data conversion script as opposed to production-level code.
+2  A: 

I just put together this solution while writing this question, which seems to work

I don't think it does work. Here I've added a further two groups (month = 4 and 5 respectively) which I would consider to be distinct yet the result is the same i.e. month = 1 and 2 only:

WITH employees_paid (Month, Employee, PercentOfTotal)
AS 
(
 SELECT 1, 'Alice', 0.25
 UNION ALL
 SELECT 1, 'Barbara', 0.65
 UNION ALL
 SELECT 1, 'Claire', 0.1
 UNION ALL
 SELECT 2, 'Alice', 0.25
 UNION ALL
 SELECT 2, 'Barbara', 0.5
 UNION ALL
 SELECT 2, 'Claire', 0.25
 UNION ALL
 SELECT 3, 'Alice', 0.25
 UNION ALL
 SELECT 3, 'Barbara', 0.65
 UNION ALL
 SELECT 3, 'Claire', 0.1
 UNION ALL
 SELECT 4, 'Barbara', 0.25
 UNION ALL
 SELECT 4, 'Claire', 0.65
 UNION ALL
 SELECT 4, 'Alice', 0.1
 UNION ALL
 SELECT 5, 'Diana', 0.25
 UNION ALL
 SELECT 5, 'Emma', 0.65
 UNION ALL
 SELECT 5, 'Fiona', 0.1
), 
temp_ids (Month)
AS
(
 SELECT DISTINCT MIN(Month)
   FROM employees_paid
  GROUP 
     BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid AS EMP
       INNER JOIN temp_ids AS IDS 
          ON EMP.Month = IDS.Month
 GROUP 
    BY EMP.Month, EMP.Employee, EMP.PercentOfTotal;
onedaywhen
Good point -- however, in my case, there are always a fixed number of employees. Each distribution will have the same 3 employees, no more, no less, and no different. I can afford to take shortcuts based on that assumption, but you're correct in the general sense -- it won't work when new employees are introduced.
Well, if your solution works for you then it looks like the best of the bunch here to me ;)
onedaywhen