As stated in the title, when designing databases, what is the preferred way to handle tables that have multiple columns that are just storing true / false values as just a single either or value (e.g. "Y/N: or "0/1")? Likewise, are there some issues that might arise between different databases (e.g. Oracle and SQL Server) that might affect how the columns are handled?
views:
412answers:
10Bit columns are typically used to represent T/F or Y/N type values, at least in SQL Server. Although a database purist might tell you that Bit columns have no place in databases because they are "too close to the hardware" - Joe Celko.
Use whatever makes sense for the specific database engine you're using. It's the interface to the database that needs to handle it. If the code-side interface to the database is sufficiently modularized, then it will be nothing more than a simple one-line change to handle a different boolean type in the underlying database.
In SQL Server
, there is BIT
datatype. You can store 0 or 1 there, compare the values but not run MIN
or MAX
.
In Oracle
, you just use NUMBER
or CHAR(1)
.
In MySQL
and PostgreSQL
any datatype is implicitly convertible to BOOLEAN
.
Both systems support BOOLEAN
datatype which you can use as is, without the operators, in the WHERE
or ON
clauses:
SELECT *
FROM mytable
WHERE col1
, which is impossible in SQL Server
and Oracle
(you need to have some kind or a predicate there).
In MySQL
, BOOLEAN
is a synonym for TINYINT(1)
.
In PostgreSQL
too (in terms of storage), but logically, it's not implicitly convertible to any other type.
From my own experience, I prefer char(1) for 'Y' or 'N'. Using 0 and 1 can be a bit confusing depending on how many beers I already drank and C++ main() function return 0 on success. ENUM and BIT types are more trouble than they are worth.
It is interesting to note that MySQL information_schema
uses VARCHAR(3) for 'YES' or 'NO'.
Example:
information_schema.`USER_PRIVILEGES (
...
IS_GRANTABLE VARCHAR(3) NOT NULL DEFAULT ''
)
Instead of boolean datatypes, you may want to consider another data model for storing boolean values, which may be particularly appropriate in the following cases:
- when you will be having many yes/no columns.
- when you probably require adding more yes/no columns in the future.
- when the yes/no values do not change very frequently.
Defining user permissions may be a typical example of the above. Consider the following tables:
Table "Users": (user_id, name, surname, country)
Table "Permissions": (permission_id, permission_text)
Table "Users_Permissions": (user_id, permission_id)
In the Permissions
table you would define all the possible permissions that may be applicable to users. You would have to add a row to the Permissions
table for each yes/no attribute. As you may have noticed, this makes it very easy to add new permissions in the future without having to modify the database schema.
With the above model, you would then indicate a TRUE value by assigning a user_id
with a permission_id
in the Users_Permissions
table. Otherwise it would be FALSE by default.
For example:
Table "Permissions"
permission_id text
-----------------------------------
1 "Read Questions"
2 "Answer Questions"
3 "Edit Questions"
4 "Close Questions"
Table "Users_Permissions"
user_id permission_id
-----------------------------------
1 1
1 2
1 3
2 1
2 3
Advantages
- Indexing: you can easily use an index on to query for specific facts.
- Space: The default convention saves space when you have many false values.
- Normalized: Facts are defined in their own tables (in the
Permissions
andUsers_Permissions
tables.) You can easily store more information on each fact.
Disadvantages
- Queries: Simple queries would require JOINs.
- Setting to False: To set a value to false, you would have to delete a row (from the
Users_Permissions
table.) Otherwise you can use a 'deleted' flag in theUsers_Permissions
table, which would also allow you to store information for audit trails, such as when a permission was modified and by whom. If you delete the row, you would not be able to store this information.
I would typically do this without BIT/BOOLEAN values at all. Instead I'd have three tables. Let's say we have a project management system which has projects, and these projects have a whole bunch of attributes.
Then we have the tables:
Project - Project_ID (INT), - Name (VARCHAR) Attribute - Attribute_ID (INT), - Name (VARCHAR) ProjectAttribute_Rel - Project_ID (INT), - Attribute_ID (INT)
Whether a project's attribute is true or false depends on whether there's a line for it in ProjectAttribute_Rel.
Typically, you'd be dealing with Attribute_IDs in your code, so when you read the attributes of the project (where you presumably have the Project_ID), you just do (PHP arbitrarily used as an example):
$arrAttributes = array();
$oQuery = mysql_query('
SELECT Attribute_ID
FROM ProjectAttribute_Rel
WHERE Project_ID = '.addslashes($iProjectId).'
');
while ($rowAttribute = mysql_fetch_assoc($oQuery)) {
$arrAttributes[] = $rowAttribute['Attribute_ID'];
}
At this point, you can check whether the project's attribute is true by checking whether it exists in $arrAttributes at all. In PHP, this would be:
if (in_array($arrAttributes, $iAttributeId)) {
// Project attribute is true!
}
This approach also allows you to do all sorts stunts to avoid listing a plethora of attributes when you update, again when you select (because SELECT * is bad in code), when you insert and so forth. This is because you can always loop through the table Attribute to find the available attributes, so if you add one and you do things this way, adding/editing/deleting attributes is trivial. Odds are your SQL won't even need to be changed, because the attributes themselves are defined in database, not in code.
Hope this helps.
If your DBMS supports a boolean data type, like MySQL, use it. If it doesn't, like Oracle, I typically use a char(1) with values of Y or N. In the latter case, it's a good idea to write a couple of functions to convert your Java or C++ or whatever boolean type to and from Y/N so you avoid having redundant code to do this. It's a pretty trivial function, but it will have to deal with cases like nulls or values other than Y or N and you want to do that consistently.
I would definitely NOT pack flags into a single variable with bit operations. Yes, this will save some disk space, but the price is much greater complexity and opportunities for error. If your DBMS doesn't support bit operations -- and as I've never had a desire to do such a thing, I don't know off the top of my head which, if any, do -- then you'll have a real hard time selecting or sorting based on such a flag. Sure, you could retrieve all records meeting other criteria and then have the calling code weed out the ones with the proper flag value. But what if only a small percentage of the records have the desired flag value, and you have a query that joins many other records? Like "select employee.name, sum(pay.amount) from employee join pay using (employee_id) where employee.executive=true and pay.bonus=true". With the where clause, you probably retrieve a very modest number of records. Without it, you retrieve the entire database.
Disk space is cheap these days, so any disk savings would likely be unimportant. If you really have a huge number of flags -- like hundreds or thousands of them per record -- I suppose there might be a case for packing them. But that would be way down my list of design choices.
Edit: Let me elaborate on writing a class to convert your "SQL boolean" to a "Java boolean". The same applies to any language, but I'll use Java as the example.
If your DBMS has a built-in boolean type, then with Java you can read this directly into a boolean variable with ResultSet.getBoolean().
But if you have to store it as, say, a char "Y" or "N", then you have to read it into a string. So it makes sense to me to declare a class like this:
class MyBoolean
{
boolean value;
final static MyBoolean TRUE=new MyBoolean(true), FALSE=new MyBoolean(false);
public MyBoolean(boolean b)
{
value=b;
}
public MyBoolean(String s)
{
if (s==null)
return null;
else if (s.equals("Y"))
return MyBoolean.TRUE;
else
return MyBoolean.FALSE;
}
public static String toString(MyBoolean b)
{
if (b==null)
return null;
else if (b.value)
return "Y";
else
reutrn "N";
}
public String toString()
{
return toString(this);
}
}
Then you can easily pick up booleans from the database with "MyBoolean flag=new MyBoolean(rs.getString("flag"));" and write to the database with "rs.setString("flag", flag.toString());"
And of course you can then add whatever other logic you need to the class if you have other boolean stuff you need to do. If for some purposes you want to display booleans as T/F or Yes/No or On/Off or whatever, you can just add alternate toString variants -- toTFString or toString(value,truetext,falsetext) or whatever -- rather than writing similar code over and over.
I think "Y/N" values are more meaningful than "1/0". With Oracle, I would do the following to have the data validated as much as possible by the database engine:
- define the columns as char(1)
- add a check constraint that possible values are restricted to "in ('Y', 'N')
- if consistent with business rules, make them not nullable - this can prevent problems when you implicitly assume that anything that is not 'Y' has a value of 'N' in your SQL
Instead of adding a column, I suggest you create another table. Hear me out...
Suppose you have a table named Customer
:
CREATE TABLE Customer
(
CustomerID NUMBER,
Name VARCHAR(100)
)
Now, suppose you want to indicate whether or not a customer is allowed to show up in search results. One option is to add some column that represents one of two possible states:
CREATE TABLE Customer
(
CustomerID NUMBER,
Name VARCHAR(100),
Searchable BOOLEAN /* or CHAR(1) or BIT... */
)
Your search query will look something like this:
SELECT CustomerID, Name
FROM Customer
WHERE Name LIKE '%TheCustomerNameIAmLookingFor%'
AND Searchable = TRUE /* or 'Y' or 0... */
This is nice and simple. Many people on this thread are giving good advice for picking which data type this column should be in order to make the syntax play nicely across various databases.
Alternative: Creating a separate table
Instead of adding another column to Customer
, I will create a separate table that stores the CustomerID
of every searchable customer.
CREATE TABLE Customer
(
CustomerID NUMBER,
Name VARCHAR(100)
)
CREATE TABLE SearchableCustomer
(
CustomerID NUMBER
)
In this case, a customer is considered searchable if their CustomerID
exists in the SearchableCustomer
table. The query for searching for customers now becomes:
SELECT CustomerID, Name
FROM Customer
WHERE Name LIKE '%TheCustomerNameIAmLookingFor%'
AND CustomerID IN (SELECT CustomerID FROM SearchableCustomer)
You will see that this strategy is very transferable across RDBMSs:
- Finding searchable customers uses the
IN
clause or aJOIN
- Making a customer searchable uses the
INSERT
statement - Making a customer non-searchable uses the
DELETE
statement
A Surprise Benefit
You are free to make the definition of a searchable customer as elaborate as you want if you make SearchableCustomer
a view instead of a table:
CREATE VIEW SearchableCustomer AS
SELECT CustomerID
FROM Customer
WHERE Name LIKE 'S%' /* For some reason, management only cares about customers whose name starts with 'S' */
Your search query doesn't change at all! :D In my experience, this has led to tremendous flexibility.
"SELECT * FROM mytable WHERE col1
, which is impossible in SQL Server and Oracle (you need to have some kind or a predicate there)."
Which goes only to show what a ludicrous and laughable abomination Oracle and SQL server really are.
If col1 is declared to be of type BOOLEAN, then the expression "col1" IS a predicate.
If the semantics of the WHERE clause require that its expression just evaluates to a truth value, and if some column is declared to be of the type "truth value", then "WHERE that-column" should be both allowed and supported. Period. Any system that doesn't just exposes its authors for the incompetent mediocre quacks that they are.