views:

50

answers:

3

I've been doing quite a bit of searching, but haven't been able to find many resources on the topic. My goal is to store scheduling data like you would find in a Gantt chart. So one example of storing the data might be:

Task Id | Name    | Duration
1         Task A    1
2         Task B    3
3         Task C    2

Task Id | Predecessors
1         Null
2         Null
3         1
3         2

Which would have Task C waiting for both Task A and Task B to complete.

So my question is: What is the best way to store this kind of data and efficiently query it? Any good resources for this kind of thing? There is a ton of information about tree structures, but once you add in multiple parents it becomes hard to find info. By the way, I'm working with SQL Server and .NET for this task.

+2  A: 

Use adjacency list model:

chain

task_id  predecessor
3        1
3        2

and this query to find all predecessors of the given task:

WITH    q AS
        (
        SELECT  predecessor
        FROM    chain
        WHERE   task_id = 3
        UNION ALL
        SELECT  c.predecessor
        FROM    q
        JOIN    chain c
        ON      c.task_id = q.predecessor
        )
SELECT  *
FROM    q

To get the duration of the longest parent for each task:

WITH    q AS
        (
        SELECT  task_id, duration
        FROM    tasks
        UNION ALL
        SELECT  t.task_id, t.duration
        FROM    q
        JOIN    chain с
        ON      c.task_id = q.task_id
        JOIN    tasks t
        ON      t.task_id = c.predecessor
        )
SELECT  task_id, MAX(duration)
FROM    q
Quassnoi
Perhaps I should be more specific. I've looked into the adjacency list model, but I haven't found good ways of doing stuff like rolling up the duration of all my tasks. It would be easy if they didn't have multiple parents, but how can I efficiently account for the fact that I don't want the sum of both parents, but the longest duration of them all?
Ocelot20
@BPotocki: please post some sample data and the resultset you'd like to get.
Quassnoi
+2  A: 

Your problem is related to the concept of relationship cardinality. All relationships have some cardinality, which expresses the potential number of instances on each side of the relationship that are members of it, or can participate in a single instance of the relationship. As an example, for people, (for most living things, I guess, with rare exceptions), the Parent-Child relationship has a cardinality of 2 to zero or many, meaning it takes two parents on the parent side, and there can be zero or many children (perhaps it should be 2 to 1 or many)

In database design, generally, anything that has a 1(one), (or a zero or one), on one side can be easily represented with just two tables, one for each entity, (sometimes only one table is needed see note**) and a foreign key column in the table representing the "many" side, that points to the other table holding the entity on the "one" side.

In your case you have a many to many relationship. (A Task can have multiple predecessors, and each predecessors can certainly be the predecessor for multiple tasks) In this case a third table is needed, where each row, effectively, represents an association between 2 tasks, representing that one is the predecessor to the other. Generally, This table is designed to contain only all the columns of the primary keys of the two parent tables, and it's own primary key is a composite of all the columns in both parent Primary keys. In your case it simply has two columns, the taskId, and the PredecessorTaskId, and this pair of Ids should be unique within the table so together they form the composite PK.

When querying, to avoid double counting data columns in the parent tables when there are multiple joins, simply base the query on the parent table... e.g., to find the duration of the longest parent, Assuming your association table is named TaskPredecessor

  Select TaskId, Max(P.Duration)
  From Task T Join Task P
     On P.TaskId In (Select PredecessorId 
                     From TaskPredecessor
                     Where TaskId = T.TaskId)

** NOTE. In cases where both entities in the relationship are of the same entity type, they can both be in the same table. The canonical (luv that word) example is an employee table with the many to one relationship of Worker to Supervisor... Since the supervisor is also an employee, both workers and supervisors can be in the same [Employee] table, and the realtionship can gbe modeled with a Foreign Key (called say SupervisorId) that points to another row in the same table and contains the Id of the employee record for that employee's supervisor.

Charles Bretana
+1  A: 

Check "Hierarchical Weighted total" pattern in "SQL design patterns" book, or "Bill Of Materials" section in "Trees and Hierarchies in SQL".

In a word, graphs feature double aggregation. You do one kind of aggregation along the nodes in each path, and another one across alternative paths. For example, find a minimal distance between the two nodes is minimum over summation. Hierarchical weighted total query (aka Bill Of Materials) is multiplication of the quantities along each path, and summation along each alternative path:

     
       with TCAssembly as (
          select Part, SubPart, Quantity AS factoredQuantity
          from AssemblyEdges
          where Part = ‘Bicycle’
          union all
          select te.Part, e.SubPart, e.Quantity * te.factoredQuantity
          from TCAssembly te, AssemblyEdges e
          where te.SubPart = e.Part
       ) select SubPart, sum(Quantity) from TCAssembly
       group by SubPart        
Tegiri Nenashi