views:

35

answers:

1

My database is MS SQL 2008.

I"m basically merging some sets of data together from two or more databases to end up with one owner of a set of data possibly related by two fields.

Table

ID     Name     Code
1      Ben      1
2      Ben      1
3      Frank    1
4      Frank    2
5      Mark     2
6      Mary     3
7      Chuck    3
8      Rogue    10
9      Charles  11

The data is in no order, the "Parent" doesn't matter, as long as there is one per group. The children of the parent is a set of records that are related by having the same name, or same Code, or both. Each record can only appear once in the result IE. cant belong to more than one group.

Here is one possible result (hierarchy doesn't have to be represented this way):

ID     Name     Code     ParentID
1      Ben      1        NULL
2      Ben      1        1
3      Frank    1        1
4      Frank    2        1
5      Mark     2        1
6      Mary     3        NULL
7      Chuck    3        6

Record ID {1} is the parent of group 1 (picked because first of the common set)

{2} shares the same name so its included (also could be included because same Code)

{3} shares same code so its included

{4} shares same name as {3} so its included

{5} shares same code with {4} so its included

{6} and {7} share same code, so form a new group.

{8} and {9} are excluded from the result as there is no other common records.

I think ive come up with a solution at work which uses about 3 or 4 joins of this table on itself, and its rather convoluted. Any suggestions on how to tackle this? I sense maybe a use of recursive CTE's but I cant wrap my brain around it.

A: 

I don't think that a recursive CTE is going to work here. The query is based entirely upon sequential logic and there's no logical "next set" for any given state because you can't know where the stop point is without first scanning each row one-by-one; in other words, the results essentially need to be evaluated row-by-row. The purpose of using a recursive CTE is to be able to append sets; if you're just appending rows then what you end up with is no better than a cursor.

I would actually use a CLR User-Defined Aggregate for something like this, because there is no good-performance pure SQL solution that I can think of, but if you need a pure SQL solution, here is one using regular (not recursive) CTEs and the windowing functions:

;WITH Rows_CTE AS
(
    SELECT
        ID, Name, Code,
        ROW_NUMBER() OVER (ORDER BY ID) AS RowNum
    FROM @Tbl
),
Changes_CTE AS
(
    SELECT
        r1.ID, r1.Name, r1.Code,
        CASE
            WHEN r1.Name = r2.Name OR r1.Code = r2.Code THEN NULL
            ELSE r1.ID
        END AS BeginGroupID
    FROM Rows_CTE r1
    LEFT JOIN Rows_CTE r2
        ON r2.RowNum = r1.RowNum - 1
),
Groups_CTE AS
(
    SELECT ID, Name, Code, BeginGroupID, m.EndGroupID
    FROM Changes_CTE c1
    CROSS APPLY
    (
        SELECT MIN(ID) AS EndGroupID
        FROM Changes_CTE c2
        WHERE c2.ID > c1.BeginGroupID
        AND c2.BeginGroupID IS NOT NULL
    ) m
)
SELECT
    t.*,
    CASE
        WHEN t.ID = g.BeginGroupID THEN NULL
        ELSE g.BeginGroupID
    END AS ParentID
FROM Groups_CTE g
INNER JOIN @Tbl t
    ON t.ID >= g.BeginGroupID
    AND t.ID < g.EndGroupID

That gets the results you're asking for. It could be written in a more compact form but I've tried to aim for readability.

(Addendum: We could use a recursive CTE and improve this significantly if it were known at the beginning that each and every name/code can only go under one "parent" - but that assumption is not documented anywhere, so we really have to assume worst-case.)

Aaronaught
Works well. However it did leave the "Rogue" one in the result when it should be left out since it has no children (solvable by doing a count on the "group").I didn't realize you could use row_number() without a PARTITION to create a row number. I also never seen CROSS APPLY used on a sub-query, Ive only used it on a function only.The downside is I realized from your answer I was missing items from my requirements and I apologize. I tried to simplify for clarity and left out some key parts. I will create a new question since this is a Problem with a correct answer.
Ben Dempsey