ansaurus

Question

SQL Server 2005 database design - many-to-many relationships with hierarchy

Answer 1

+1 A:

Well you wouldn't put it all into one table. You need to read up on normalizing data and joins. (And never store anything in a comma delimted list.)

No database worth it's salt would have the slightest problem handling a million records, that is a tiny database.

You need tables for functions, courses, locations, people, organization and possibly some joining tables to accommodate many to many relationships. But none of this is hard or even beyond very basic design. I recommend that before you do anything, you get a book on your chosen database and read up on the basics.

HLGEM 2010-03-19 13:40:26

+1 for last sentance, recomending some DB design books.

David Waters 2010-03-19 13:44:43

HLGEM - I have edited my original post to make things a bit clearer as I think I may have confused somewhat initially. BTW, ordered 'SQL for Smarties' by Celko. I am using SQL Server 2005 - any books you would recommend?

Remnant 2010-03-19 15:50:09

HLGEM 2010-03-19 17:45:33

Very Very true!

Henri 2010-03-22 21:50:15

probably you would not put it in the same table, BUT if you read a bit on nested sets and for some other approaches take into account that decent databases can do recursive queries and if the requirement would be to work with hierarchies of any size and shape then single table might be the most proper approach.

Unreason 2010-03-26 12:59:24

Answer 2

A:

Try giving each entity a table of its own e.g

//Table Structure
location
    locationId
    name

division
    divisionId
    name
    locationId (fk => location)

department
    deparmentId
    name
    divisionId (fk => division)

function
    functionId
    name
    departmentId(fk => department)

jobrole
    jobroleId
    name
    functionId

course
    courseID
    name

jobrole_course_requirement
    jobroleID
    courseID

employee
     employeeID
     name

employee_jobRole
     employeeID
     jobRoleId

emploeyee_course_attendance
     emploeyee_course_attendanceID
     emploeyeeID
     courseID
     dateAttended

And the some sample selects

// Get course requirements for an employee
select course.name 
  from course, 
       jobrole_course_requirement, 
       employee_jobRole
  where 
       employee_jobRole.employeeID = 123 and
       jobrole_course_requirement.JobRoleId = employee_jobRole.JobRoleId
       course.courseID = jobrole_course_requirement.courseID

David Waters 2010-03-19 13:41:51

David - I have edited my original post to make things a bit clearer as I think I may have confused somewhat initially.

Remnant 2010-03-19 15:51:03

Answer 3

A:

Usually when I am setting up a db, I come up with what entities I need and how they are related to each other (ie many-one, one-one,...). Which you seem to have done. So next I figure out what each entity will need. For example, Location may have: locationid, address, ... Then, Divisions Assuming each that there are one location for many divisions, you could have the division entity have a divisionid, locationid, the information each division needs. So basically, if its a one-many relationship like one location to many divisions, you could just put the id of location in the division table. However, if it is a many-many relation, it is probably better to have an intermediary table to connect the two so you do not need to have duplicate records with only an id changing.

John 2010-03-19 13:46:12

John - I have edited my original post to make things a bit clearer as I think I may have confused somewhat initially.

Remnant 2010-03-19 15:51:31

Answer 4

+1 A:

You need a simple star relationship. The Position (fact table) has just ID's of related master tables (Department, Division etc). This allows for any combination of the master tables to be used

The master tables can have simple hierarchy built into each of them as needed. And can relate to each other as needed. But the detail of this does not effect the queries against Position

You can make ID's in Position nullable for optional relationships

You could add a StartDate and EndDate columns to Position to track changes over time

A simple example of this is:

SQL Table Diagram

TFD 2010-03-22 21:33:04

I believe your solution is not completely normalized - for example Finance in Marketing has no Tax and Finance in Textile has no Accounts. That is something that your model can not reflect. Your solution is normalized only if there is no functional dependencies between DepartmentID, FunctionID, EmployeeID (if they are independent)

Unreason 2010-03-25 11:58:34

correction: that was supposed to be DepartmentID, FunctionID and DivisionID

Unreason 2010-03-25 13:14:43

Duh! It's a sample to show the possibilities. You can add dependencies as required (example shown for department and division). We don't have enough information to make it fully *normalised*, @Remnant can just add a relationship from department to function etc

TFD 2010-03-25 20:44:29

@TFD: Really? I though I quoted you example data from the question which clearly shows that star model is not a good choice.

Unreason 2010-03-26 08:22:33

Answer 5

A:

Perhaps (probably) you should consider the HR department of the Textile division as a different department than the HR department of the Marketing division.

erikkallen 2010-03-22 21:50:29

Answer 6

+1 A:

Based on the updated post, and making some (fairly obvious) assumptions based on the names used, I come up with the following. There are four entities:

Divisions
Departments
Functions
Entities

There are many relationships between these entities. Few of them are hierarchical, most are simple associations:

Option A1: There is a master list of functions. Every department can perform (or do) one or more function, and a function might be performed by more than on department.
Option A2: Functions are “owned” by departments. No function can be performed by two or more departments. (This appears to be the case, as the HR Dept has Payroll and Hiring, and the Finance Dept has Audit, Tax, and Accounts.)
Functions are performed by departments for (on behalf of) divisions. (HR Dept does Payroll and Hiring for both Textile and Marketing divisions; Finance Dept does Audit and Tax--but not Accounts--for Textile division, and Audit and Accounts--but not Tax--for Marketing division.) Perhaps a bit more precisely, departments perform selected functions for selected divisions that they are associated with, and that association is defined by their performance of that function.
Beyond performing the work of functions, there appears to be no relationship between departments and divisions. There is no hierarchical relationship between them, as one does not “own” or contain the other.

This leads to these roughly sketched out tables:

--  Division  -----
DivisionId  (primary key)

--  Department  ---
DepartmentId  (primary key)

--  Function  -----  (assumes option A2)
FunctionId   (primary key)
DepartmentId (foreign key, references Department)

--  DivisionFunctions  ----
DivisionId  (First column of compound primary key)
FunctionId  (Second column of compound primary key)

(You could optionally include a surrogate key to uniquely identify each row, but DivisionId + FunctionId would work.)

There isn’t enough material here to fully describe how "employees" fit into the model. Given that employees do the work of functions: can an employee do the work of more than one function, or do they only do the one? Does an employee do the work of the function regardless of the division(s) it is being done for, or are they assigned to do the work for one or more divisions? Two obvious options here, though more complex variants are possible:

Option B1: Employees do the work of one or more functions within departments, and perform that work for all divisions that require that function of that department.
Option B2: Employees are assigned to perform a specific function for a specific division.

Given these, tables might look like:

--  Employee  -----  (assumes option B1)
EmployeeId    (primary key)
DepartmentId  (foreign key, references Department)

--  EmployeeFunction  -----  (assumes option B1)
EmployeeId  (First column of compound primary key)
FunctionId  (Second column of compound primary key)

... and thus all employees that can perform a function will perform it for all divisions requiring it. Or,

--  Employee  -----  (assumes option B2)
EmployeeId  (primary key)
DepartmentId  (foreign key, references Department)

--  EmployeeAssignment  -----  (assumes option B2)
EmployeeId  (foreign key, references Employee)
DivisionId  (first of two-column foreign key referencing DivisionFunctions)
FunctionId  (second of two-column foreign key referencing DivisionFunctions)

(Or, instead of DivisionId and FunctionId, include the optional surrogate key from DivisionFunctions.) ... and thus employees are assigned individually to functions to be performed by the department for a division.

But that still leaves a lot of “what if/when” questions: Do employees “belong to” departments? Can employees belong to (work for) multiple departments? Perhaps employees belong to divisions? Do you track what functions an employee can do, even if they are not currently doing it? Similarly, do you track what department an employee works for, even if they are currently “between functions”? If an employee can perform functions A and B, and a division requires both these functions, might an employee be assigned to only perform A and not B for that division?

There’s a more requirements research to be done here, but I’d like to think this is a good start.

Philip Kelley 2010-03-25 03:51:19

Philip - Great, comprehensive response. I think I could really use this to augment my thinking and approach - I'll digest this over the next 24 hours. Quick note - employees can only work for one function. They cannot work for multiple functions or divisions.

Remnant 2010-03-25 08:29:14

Answer 7

+1 A:

As you are "abecedarian" :), one thing to do before any attempt to feel at home with database design is read about normalization, and to completely understand all normal forms up to 5NF

If you want to model that
1. departments are in divisions
2. functions are performed in departments
3. employees perform functions

and that not all functions are performed in all of the departments, nor all the departments are in all divisions then you have to store that fact somewhere.

While doing logical design, give your tables descriptive names, so some departments are in divisions

departments_in_divisions
candidate key: department, division

then you have some functions in some departments

functions_departments_divisions
candidate key: function, department, division
references: (department, division) in departments_divisions

then employees have some functions from some departments and divisions

employees_function_department_division
candidate key: employee, function, department, division
references: (function, department, division) in functions_departments_divisions

After (or before this) you have 3 more entities functions, departments and divisions which would list all the possible departments, divisions and functions that would also be referenced by the above tables (this might not be completely normalized).

Also the names of the entities (tables) can become something more appropriate to you (only you can know the full semantics of the model of your data). Especially if you notice that you need to assign other attributes (fields) to them.

The values for departments, divisions and functions are their names, there are no artificial ids yet in the above analysis. You can introduce them in the next step, after the logical modelling comes physical modelling, or you can keep the natural keys. If you go with artificial keys that can cut down the usage of composite keys to max 2, but it does obfuscate the relationships and the meaning of the facts that you are storing in your tables. (Example functionID can be and ID of a function name or an id of a function that is performed in certain division/department combination - it is not clear what it is and these are not interchangeable; sort of like the difference between an instance and a class).

Unreason 2010-03-25 13:13:43

ansaurus

tags:

views:

answers:

SQL Server 2005 database design - many-to-many relationships with hierarchy

related questions