views:

2251

answers:

8

I need to sum points on each level earned by a tree of users. Level 1 is the sum of users' points of the users 1 level below the user. Level 2 is the Level 1 points of the users 2 levels below the user, etc...

The calculation happens once a month on a non production server, no worries about performance.

What would the SQL look like to do it?

If you're confused, don't worry, I am as well!

User table:

ID    ParentID    Points
1     0           230
2     1           150
3     0           80
4     1           110
5     4           54
6     4           342

Tree:
0
|---\
1    3
| \
2  4---
    \  \
     5  6

Output should be:

ID    Points    Level1     Level2
1     230       150+110    150+110+54+342
2     150
3     80
4     110       54+342
5     54
6     342

SQL Server Syntax and functions preferably...

+1  A: 

I would say: create a stored procedure, probably has the best performance. Or if you have a maximum number of levels, you could create subqueries, but they will have a very poort performance.

(Or you could get MS SQL Server 2008 and get the new hierarchy functions... ;) )

Grad van Horck
My thinking as well, but what would the Procedure look like?
Jrgns
+2  A: 

If you were using Oracle DBMS that would be pretty straightforward since Oracle supports tree queries with the CONNECT BY/STARTS WITH syntax. For SQL Server I think you might find Common Table Expressions useful

Manrico Corazzi
+2  A: 

Trees don't work well with SQL. If you have very (very very) few write accesses, you could change the tree implementation to use nested sets, that would make this query incredibly easy (SELECT SUM(points) FROM users where left > x and right < y, if I'm not mistaken). However, any changes on the tree require touching a massive amount of rows. It's probably better to just do the recursion in you client.

MattW.
A: 

You have a couple of options:

  1. Use a cursor and a recursive user-defined function call (it's quite slow)
  2. Create a cache table, update it on INSERT using a trigger (it's the fastest solution but could be problematic if you have lots of updates to the main table)
  3. Do a client-side recursive calculation (preferable if you don't have too many records)
Ilya Kochetov
A: 
Matthias Kestenholz
A: 

You can write a simple recursive function to do the job. My MSSQL is a little bit rusty, but it would look like this:

CREATE FUNCTION CALC
(
@node integer,
)
returns 
(
@total integer
)
as
begin
    select @total = (select node_value from yourtable where node_id = @node);

    declare @children table (value integer);
    insert into @children   
    select calc(node_id) from yourtable where parent_id = @node;

    @current = @current + select sum(value) from @children;
    return
end
Milan Babuškov
Ok, what will the function look like?
Jrgns
I don't have MSSQL instalation here, but it would go something like:getsum(parentNode int) sum = select value where node = parentNode; foreach row in select children from table where parent = parentNode sum = sum + getsum(childnode)You would call it on the top node.
Milan Babuškov
+1  A: 

SQL in general, like others said, does not handle well such relations. Typically, a surrogate 'relations' table is needed (id, parent_id, unique key on (id, parent_id)), where:

  • every time you add a record in 'table', you:

    INSERT INTO relations (id, parent_id) VALUES ([current_id], [current_id]);

    INSERT INTO relations (id, parent_id) VALUES ([current_id], [current_parent_id]);

    INSERT INTO relations (id, parent_id) SELECT [current_id], parent_id FROM relations WHERE id = [current_parent_id];

  • have logic to avoid cycles

  • make sure that updates, deletions on 'relations' are handled with stored procedures

Given that table, you want:

SELECT rel.parent_id, SUM(tbl.points)
FROM table tbl INNER JOIN relations rel ON tbl.id=rel.id
WHERE rel.parent_id <> 0
GROUP BY rel.parent_id;
ΤΖΩΤΖΙΟΥ
A: 

Ok, this gives you the results you are looking for, but there are no guarantees that I didn't miss something. Consider it a starting point. I used SQL 2005 to do this, SQL 2000 does not support CTE's

WITH Parent (id, GrandParentId, parentId, Points, Level1Points, Level2Points)
AS
(
    -- Find root
    SELECT id,  
      0 AS GrandParentId,
      ParentId,
      Points,
      0 AS Level1Points,
      0 AS Level2Points
    FROM tblPoints ptr
    WHERE ptr.ParentId = 0

    UNION ALL (
    -- Level2 Points
    SELECT pa.GrandParentId AS Id,
      NULL AS GrandParentId,
      NULL AS ParentId,
      0 AS Points, 
      0 AS Level1Points,
      pa.Points  AS Level2Points
    FROM tblPoints pt
      JOIN Parent pa ON pa.GrandParentId = pt.Id 
    UNION  ALL
    -- Level1 Points
    SELECT pt.ParentId AS Id,
      NULL AS GrandParentId,
      NULL AS ParentId,
      0 AS Points, 
      pt.Points AS Level1Points,
      0 AS Level2Points
    FROM tblPoints pt
      JOIN Parent pa ON pa.Id = pt.ParentId AND pa.ParentId IS NOT NULL 
    UNION  ALL
    -- Points
    SELECT pt.id,
      pa.ParentId AS GrandParentId,
      pt.ParentId,
      pt.Points, 
      0 AS Level1Points,
      0 AS Level2Points
    FROM tblPoints pt
      JOIN Parent pa ON pa.Id = pt.ParentId AND pa.ParentId IS NOT NULL )
)
SELECT id, 
    SUM(Points) AS Points,  
    SUM(Level1Points) AS Level1Points,
    CASE WHEN SUM(Level2Points) > 0 THEN  SUM(Level1Points) + SUM(Level2Points) ELSE 0 END AS Level2Points
FROM Parent
GROUP BY id 
ORDER by id
Darrel Miller