ansaurus

Question

How do I efficiently group data in a hierarchical format in T-SQL?

Answer 1

+1 A:

Assuming the structure of the field task is consistent, you could use the following

select left(task,4) as Task,sum(hours) as Hours
from table
group by left(task,4)

Here is a slightly modified version

select LEFT(task,charindex('.',task+'.')+1),SUM(hours)
from test1
group by LEFT(task,charindex('.',task+'.')+1)

Sparky 2010-01-06 22:34:05

I edited my data to show that it's not a consistent length.

Nathan DeWitt 2010-01-06 22:40:20

I think your updated answer will still fail if the middle part if more than one digit? If you can determine the length of the middle part and replace "+1" with that value, I think you have it.

Phil Sandler 2010-01-06 23:14:03

Answer 2

+2 A:

Is changing the model an option? If your task column is really meant to represent a hierarchy, you should really be representing the hierarchy properly in your relational model.

If the number of levels deep is fixed at three, another option might be to add three columns to represent each of the "parts" of the task column independently.

If that's not an option, I think you can achieve this with a series of CASE statements that parse the string (plus SUM and GROUP BY).

UPDATE:

Ok, this seemed like a fun challenge, so I came up with this:

SELECT
    main_task,
    SUM(hours)
FROM
    (
    SELECT      
        task,
        CASE 
            WHEN 
                LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) = CHARINDEX ('.', task) THEN task
                ELSE LEFT(task, LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) - 1)
            END main_task,
        hours
    FROM 
        #temp
    ) sub
GROUP BY 
      main_task

Phil Sandler 2010-01-06 22:54:48

I think you're right about storing the hierarchy in the model. I posted a possible way of doing that - is that along the lines of what you were thinking?

Nathan DeWitt 2010-01-07 06:53:15

Answer 3

+1 A:

Another route is to add some computed columns which break the various task levels apart, then group and sum as you wish.

Chris Lively 2010-01-06 23:00:38

Answer 4

+1 A:

I was thinking about this on my drive home, and I wanted to propose this solution:

Create a table that stores the hierarchy, and then do a join grabbing the task's parent.

TaskStructureTable:

task  | task_group
1     | 1
1.1   | 1.1
1.1.1 | 1.1
1.1.2 | 1.1
1.1.3 | 1.1
1.2   | 1.2
1.2.1 | 1.2

Then I could do something like this:

SELECT SUM(d.Hours) AS "Hours", t.task_group
FROM Data d
JOIN TaskStructureTable t ON d.Task = t.task

Think this would be faster than doing CHARINDEX? (yes, I can measure and know for sure)

Nathan DeWitt 2010-01-07 00:58:54

I think if what you are proposing meets your requirement, you could almost think of task group as a "category", and not a part of a hierarchy. You wouldn't even need a separate table, though I would recommend one for referential integrity (sort of a category lookup). And yes, structuring it this way and putting an index on the category(_id) would definitely perform better than parsing the strings.

Phil Sandler 2010-01-07 03:23:13

If you are going to change your model, and you are using sql 2008, then you might investigate using the hierarchyid data type. That would make it pretty simple to break out your reports at any level you want.

Chris Lively 2010-01-07 21:18:40

SQL Server 2005 unfortunately.

Nathan DeWitt 2010-01-08 04:30:24

ansaurus

tags:

views:

answers:

How do I efficiently group data in a hierarchical format in T-SQL?

related questions