views:

54

answers:

4

What's the best method of storing a large number of booleans in a database table?

Should I create a column for each boolean value or is there a more optimal method?

Employee Table

IsHardWorking
IsEfficient
IsCrazy
IsOverworked
IsUnderpaid
...etc.

+1  A: 

I don't see a problem with having a column for each boolean. But if you foresee any future expansion, and want to use the table only for booleans, then use a 2-column table with VARIABLE and VALUE columns, with a row for each bool.

brydgesk
A: 

Something to consider: how often will you be adding/changing/removing these booleans? If they're not likely to change then you'll probably like having them as individual columns. Many databases will probably pack them for you, especially if they're adjacent in the row, so they'll be stored efficiently.

If, on the other hand, you see yourself wanting to add/change/remove these booleans every once in a while you might be better served by something like (excuse PostgreSQL-isms and shoddy names):

CREATE TABLE employee_qualities (
    id SERIAL8 PRIMARY KEY,
    label TEXT UNIQUE
);

CREATE TABLE employee_employee_qualities (
    employee_id INT8 REFERENCES employee (id),
    quality_id INT8 REFERENCES employee_qualities (id),
    UNIQUE (employee_id, quality_id)
);
draebek
A: 

A column for each is the best representation of your business requirements. You could combine a bunch of bools into a single int column and use bit masks to read the values, but this seems unnecessarily complex, and is something I would consider only if there was some high-end performance need for it.

Also, if you are using sql server, up to 8 bit fields get combined internally into a single int, so the performance thing is sort-of done for you already. (I don;t know if other dbs do this.)

Ray
+1  A: 

If the majority of employees will have the same values across a large sample size, it can be more efficient to define a hierarchy, allowing you to establish default values representing the norm, and override them per employee if required.

Your employee table no longer stores these attributes. Instead I would create a definition table of attributes:

| ATTRIBUTE_ID | DESCRIPTION     | DEFAULT |
| 1            | Is Hard Working | 1       |
| 2            | Is Overpaid     | 0       |

Then a second table joining attributes to Employees:

| EMPLOYEE_ID | ATTRIBUTE_ID | OVERRIDE |
| 2           | 2            | 1        |

Given two employees, employee with ID 1 doesn't have an override entry, and thus inherits the default attribute values (is hard working, is not overpaid), however employee 2 has an override for attribute 2 - Is Overpaid, and is thus both hard working and overpaid.

For integrity you could place a unique constraint on the EMPLOYEE_ID and ATTRIBUTE_ID columns in the override table, enforcing you can only override an attribute once per employee.

Michael Shimmins