views:

369

answers:

7

Note

I have completely re-written my original post to better explain the issue I am trying to understand. I have tried to generalise the problem as much as possible.

Also, my thanks to the original people who responded. Hopefully this post makes things a little clearer.

Context

In short, I am struggling to understand the best way to design a small scale database to handle (what I perceive to be) multiple many-to-many relationships.

Imagine the following scenario for a company organisational structure:

             Textile Division                    Marketing Division
                    |                                     |
          ----------------------               ----------------------
          |                    |               |                    |
       HR Dept           Finance Dept        HR Dept           Finance Dept
          |                    |               |                    |
      ----------          ----------       ----------           ---------
     |          |         |        |       |        |           |       |
  Payroll     Hiring    Audit     Tax   Payroll   Hiring      Audit  Accounts
     |          |         |        |       |        |           |       |
    Emps      Emps       Emps     Emps    Emps     Emps        Emps    Emps    

NB: Emps denotes a list of employess that work in that area

When I first started with this issue I made four separate tables:

  1. Divisions -> Textile, Marketing (PK = DivisionID)
  2. Departments -> HR, Finance (PK = DeptID)
  3. Functions -> Payroll, Hiring, Audit, Tax, Accounts (PK = FunctionID)
  4. Employees -> List of all Employees (PK = EmployeeID)

The problem as I see it is that there are multiple many-to-many relationships i.e. many departments have many divisions and many functions have many departments.

Question

Giving the database structure above, suppose I wanted to do the following:

  • Get all employees who work in the Payroll function of the Marketing Division

To do this I need to be able to differentiate between the two Payroll departments but I am not sure how this can be done?

I understand that I could build a 'Link / Junction' table between Departments and Functions so that I can retrieve which Functions are in which Departments. However, I would still need to differentiate the Division they belong to.

Research Effort

As you can see I am an abecedarian when it comes to database deisgn. I have spent the last two days resaerching this issue, traversing nested set models, adjacency models, reading that this issue is known not to be NP complete etc. I am sure there is a simple solution?

+1  A: 

Well you wouldn't put it all into one table. You need to read up on normalizing data and joins. (And never store anything in a comma delimted list.)

No database worth it's salt would have the slightest problem handling a million records, that is a tiny database.

You need tables for functions, courses, locations, people, organization and possibly some joining tables to accommodate many to many relationships. But none of this is hard or even beyond very basic design. I recommend that before you do anything, you get a book on your chosen database and read up on the basics.

HLGEM
+1 for last sentance, recomending some DB design books.
David Waters
HLGEM - I have edited my original post to make things a bit clearer as I think I may have confused somewhat initially. BTW, ordered 'SQL for Smarties' by Celko. I am using SQL Server 2005 - any books you would recommend?
Remnant
HLGEM
Very Very true!
Henri
probably you would not put it in the same table, BUT if you read a bit on nested sets and for some other approaches take into account that decent databases can do recursive queries and if the requirement would be to work with hierarchies of any size and shape then single table might be the most proper approach.
Unreason
A: 

Try giving each entity a table of its own e.g

//Table Structure
location
    locationId
    name

division
    divisionId
    name
    locationId (fk => location)

department
    deparmentId
    name
    divisionId (fk => division)

function
    functionId
    name
    departmentId(fk => department)

jobrole
    jobroleId
    name
    functionId

course
    courseID
    name

jobrole_course_requirement
    jobroleID
    courseID

employee
     employeeID
     name

employee_jobRole
     employeeID
     jobRoleId

emploeyee_course_attendance
     emploeyee_course_attendanceID
     emploeyeeID
     courseID
     dateAttended

And the some sample selects

// Get course requirements for an employee
select course.name 
  from course, 
       jobrole_course_requirement, 
       employee_jobRole
  where 
       employee_jobRole.employeeID = 123 and
       jobrole_course_requirement.JobRoleId = employee_jobRole.JobRoleId
       course.courseID = jobrole_course_requirement.courseID
David Waters
David - I have edited my original post to make things a bit clearer as I think I may have confused somewhat initially.
Remnant
A: 

Usually when I am setting up a db, I come up with what entities I need and how they are related to each other (ie many-one, one-one,...). Which you seem to have done. So next I figure out what each entity will need. For example, Location may have: locationid, address, ... Then, Divisions Assuming each that there are one location for many divisions, you could have the division entity have a divisionid, locationid, the information each division needs. So basically, if its a one-many relationship like one location to many divisions, you could just put the id of location in the division table. However, if it is a many-many relation, it is probably better to have an intermediary table to connect the two so you do not need to have duplicate records with only an id changing.

John
John - I have edited my original post to make things a bit clearer as I think I may have confused somewhat initially.
Remnant
+1  A: 

You need a simple star relationship. The Position (fact table) has just ID's of related master tables (Department, Division etc). This allows for any combination of the master tables to be used

The master tables can have simple hierarchy built into each of them as needed. And can relate to each other as needed. But the detail of this does not effect the queries against Position

You can make ID's in Position nullable for optional relationships

You could add a StartDate and EndDate columns to Position to track changes over time

A simple example of this is:

SQL Table Diagram

TFD
I believe your solution is not completely normalized - for example Finance in Marketing has no Tax and Finance in Textile has no Accounts. That is something that your model can not reflect. Your solution is normalized only if there is no functional dependencies between DepartmentID, FunctionID, EmployeeID (if they are independent)
Unreason
correction: that was supposed to be DepartmentID, FunctionID and DivisionID
Unreason
Duh! It's a sample to show the possibilities. You can add dependencies as required (example shown for department and division). We don't have enough information to make it fully *normalised*, @Remnant can just add a relationship from department to function etc
TFD
@TFD: Really? I though I quoted you example data from the question which clearly shows that star model is not a good choice.
Unreason
A: 

Perhaps (probably) you should consider the HR department of the Textile division as a different department than the HR department of the Marketing division.

erikkallen
+1  A: 

Based on the updated post, and making some (fairly obvious) assumptions based on the names used, I come up with the following. There are four entities:

  • Divisions
  • Departments
  • Functions
  • Entities

There are many relationships between these entities. Few of them are hierarchical, most are simple associations:

  • Option A1: There is a master list of functions. Every department can perform (or do) one or more function, and a function might be performed by more than on department.
  • Option A2: Functions are “owned” by departments. No function can be performed by two or more departments. (This appears to be the case, as the HR Dept has Payroll and Hiring, and the Finance Dept has Audit, Tax, and Accounts.)

  • Functions are performed by departments for (on behalf of) divisions. (HR Dept does Payroll and Hiring for both Textile and Marketing divisions; Finance Dept does Audit and Tax--but not Accounts--for Textile division, and Audit and Accounts--but not Tax--for Marketing division.) Perhaps a bit more precisely, departments perform selected functions for selected divisions that they are associated with, and that association is defined by their performance of that function.

  • Beyond performing the work of functions, there appears to be no relationship between departments and divisions. There is no hierarchical relationship between them, as one does not “own” or contain the other.

This leads to these roughly sketched out tables:

--  Division  -----
DivisionId  (primary key)

--  Department  ---
DepartmentId  (primary key)

--  Function  -----  (assumes option A2)
FunctionId   (primary key)
DepartmentId (foreign key, references Department)

--  DivisionFunctions  ----
DivisionId  (First column of compound primary key)
FunctionId  (Second column of compound primary key)

(You could optionally include a surrogate key to uniquely identify each row, but DivisionId + FunctionId would work.)

There isn’t enough material here to fully describe how "employees" fit into the model. Given that employees do the work of functions: can an employee do the work of more than one function, or do they only do the one? Does an employee do the work of the function regardless of the division(s) it is being done for, or are they assigned to do the work for one or more divisions? Two obvious options here, though more complex variants are possible:

  • Option B1: Employees do the work of one or more functions within departments, and perform that work for all divisions that require that function of that department.
  • Option B2: Employees are assigned to perform a specific function for a specific division.

Given these, tables might look like:

--  Employee  -----  (assumes option B1)
EmployeeId    (primary key)
DepartmentId  (foreign key, references Department)

--  EmployeeFunction  -----  (assumes option B1)
EmployeeId  (First column of compound primary key)
FunctionId  (Second column of compound primary key)

... and thus all employees that can perform a function will perform it for all divisions requiring it. Or,

--  Employee  -----  (assumes option B2)
EmployeeId  (primary key)
DepartmentId  (foreign key, references Department)

--  EmployeeAssignment  -----  (assumes option B2)
EmployeeId  (foreign key, references Employee)
DivisionId  (first of two-column foreign key referencing DivisionFunctions)
FunctionId  (second of two-column foreign key referencing DivisionFunctions)

(Or, instead of DivisionId and FunctionId, include the optional surrogate key from DivisionFunctions.) ... and thus employees are assigned individually to functions to be performed by the department for a division.

But that still leaves a lot of “what if/when” questions: Do employees “belong to” departments? Can employees belong to (work for) multiple departments? Perhaps employees belong to divisions? Do you track what functions an employee can do, even if they are not currently doing it? Similarly, do you track what department an employee works for, even if they are currently “between functions”? If an employee can perform functions A and B, and a division requires both these functions, might an employee be assigned to only perform A and not B for that division?

There’s a more requirements research to be done here, but I’d like to think this is a good start.

Philip Kelley
Philip - Great, comprehensive response. I think I could really use this to augment my thinking and approach - I'll digest this over the next 24 hours. Quick note - employees can only work for one function. They cannot work for multiple functions or divisions.
Remnant
+1  A: 

As you are "abecedarian" :), one thing to do before any attempt to feel at home with database design is read about normalization, and to completely understand all normal forms up to 5NF

If you want to model that
1. departments are in divisions
2. functions are performed in departments
3. employees perform functions

and that not all functions are performed in all of the departments, nor all the departments are in all divisions then you have to store that fact somewhere.

While doing logical design, give your tables descriptive names, so some departments are in divisions

departments_in_divisions
candidate key: department, division

then you have some functions in some departments

functions_departments_divisions
candidate key: function, department, division
references: (department, division) in departments_divisions

then employees have some functions from some departments and divisions

employees_function_department_division
candidate key: employee, function, department, division
references: (function, department, division) in functions_departments_divisions

After (or before this) you have 3 more entities functions, departments and divisions which would list all the possible departments, divisions and functions that would also be referenced by the above tables (this might not be completely normalized).

Also the names of the entities (tables) can become something more appropriate to you (only you can know the full semantics of the model of your data). Especially if you notice that you need to assign other attributes (fields) to them.

The values for departments, divisions and functions are their names, there are no artificial ids yet in the above analysis. You can introduce them in the next step, after the logical modelling comes physical modelling, or you can keep the natural keys. If you go with artificial keys that can cut down the usage of composite keys to max 2, but it does obfuscate the relationships and the meaning of the facts that you are storing in your tables. (Example functionID can be and ID of a function name or an id of a function that is performed in certain division/department combination - it is not clear what it is and these are not interchangeable; sort of like the difference between an instance and a class).

Unreason