views:

108

answers:

4

I have data like this:

Task   | Hours
1.1    |    40
2      |    40
2.1    |    60
2.1.1  |    15
15.9   |    24
16     |     5
19.1   |    40
19.1.1 |     8
19.1.2 |    12
19.2   |     6
19.2.1 |    21
19.2.2 |    15
19.2.3 |     2
19.3   |    64

I would like to group based on the first two levels of the Task, producing this result:

Task   | Hours
1.1    |    40
2      |    40
2.1    |    75
15.9   |    24
16     |     5
19.1   |    60
19.2   |    44
19.3   |    64

I want the 16 to not roll up what's beneath it, but I need all the other levels to roll up. This is SQL Server 2005. I would normally do a split on the decimal, and break it out that way, but I was wondering if there's a better way to do it in SQL.

+1  A: 

Assuming the structure of the field task is consistent, you could use the following

select left(task,4) as Task,sum(hours) as Hours
from table
group by left(task,4)

Here is a slightly modified version

select LEFT(task,charindex('.',task+'.')+1),SUM(hours)
from test1
group by LEFT(task,charindex('.',task+'.')+1)
Sparky
I edited my data to show that it's not a consistent length.
Nathan DeWitt
I think your updated answer will still fail if the middle part if more than one digit? If you can determine the length of the middle part and replace "+1" with that value, I think you have it.
Phil Sandler
+2  A: 

Is changing the model an option? If your task column is really meant to represent a hierarchy, you should really be representing the hierarchy properly in your relational model.

If the number of levels deep is fixed at three, another option might be to add three columns to represent each of the "parts" of the task column independently.

If that's not an option, I think you can achieve this with a series of CASE statements that parse the string (plus SUM and GROUP BY).

UPDATE:

Ok, this seemed like a fun challenge, so I came up with this:

SELECT
    main_task,
    SUM(hours)
FROM
    (
    SELECT      
        task,
        CASE 
            WHEN 
                LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) = CHARINDEX ('.', task) THEN task
                ELSE LEFT(task, LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) - 1)
            END main_task,
        hours
    FROM 
        #temp
    ) sub
GROUP BY 
      main_task
Phil Sandler
I think you're right about storing the hierarchy in the model. I posted a possible way of doing that - is that along the lines of what you were thinking?
Nathan DeWitt
+1  A: 

Another route is to add some computed columns which break the various task levels apart, then group and sum as you wish.

Chris Lively
+1  A: 

I was thinking about this on my drive home, and I wanted to propose this solution:

Create a table that stores the hierarchy, and then do a join grabbing the task's parent.

TaskStructureTable:

task  | task_group
1     | 1
1.1   | 1.1
1.1.1 | 1.1
1.1.2 | 1.1
1.1.3 | 1.1
1.2   | 1.2
1.2.1 | 1.2

Then I could do something like this:

SELECT SUM(d.Hours) AS "Hours", t.task_group
FROM Data d
JOIN TaskStructureTable t ON d.Task = t.task

Think this would be faster than doing CHARINDEX? (yes, I can measure and know for sure)

Nathan DeWitt
I think if what you are proposing meets your requirement, you could almost think of task group as a "category", and not a part of a hierarchy. You wouldn't even need a separate table, though I would recommend one for referential integrity (sort of a category lookup). And yes, structuring it this way and putting an index on the category(_id) would definitely perform better than parsing the strings.
Phil Sandler
If you are going to change your model, and you are using sql 2008, then you might investigate using the hierarchyid data type. That would make it pretty simple to break out your reports at any level you want.
Chris Lively
SQL Server 2005 unfortunately.
Nathan DeWitt