views:

412

answers:

10

As stated in the title, when designing databases, what is the preferred way to handle tables that have multiple columns that are just storing true / false values as just a single either or value (e.g. "Y/N: or "0/1")? Likewise, are there some issues that might arise between different databases (e.g. Oracle and SQL Server) that might affect how the columns are handled?

A: 

Bit columns are typically used to represent T/F or Y/N type values, at least in SQL Server. Although a database purist might tell you that Bit columns have no place in databases because they are "too close to the hardware" - Joe Celko.

Randy Minder
+2  A: 

Use whatever makes sense for the specific database engine you're using. It's the interface to the database that needs to handle it. If the code-side interface to the database is sufficiently modularized, then it will be nothing more than a simple one-line change to handle a different boolean type in the underlying database.

Aaron
While true, I think the question isn't about this. I believe that, like in many cases, the questioner has hidden the real reason for his query which is he's writing a app that will support more than one database vendor... so he needs to make the least problematic choice of datatypes.
Stephanie Page
+10  A: 

In SQL Server, there is BIT datatype. You can store 0 or 1 there, compare the values but not run MIN or MAX.

In Oracle, you just use NUMBER or CHAR(1).

In MySQL and PostgreSQL any datatype is implicitly convertible to BOOLEAN.

Both systems support BOOLEAN datatype which you can use as is, without the operators, in the WHERE or ON clauses:

SELECT  *
FROM    mytable
WHERE   col1

, which is impossible in SQL Server and Oracle (you need to have some kind or a predicate there).

In MySQL, BOOLEAN is a synonym for TINYINT(1).

In PostgreSQL too (in terms of storage), but logically, it's not implicitly convertible to any other type.

Quassnoi
Is there any design issue involved with having multiple fields, or are they considered fine as is? For example, should they be merged and have the application use bitwise operations?
Rob
One addition: If you use multiple BIT columns in a table in SQL Server, it will combine them together. For example, if you have a single BIT column, it will use 1 byte storage. But, if you have 8 BIT columns, it will still use 1 byte to store both. Oracle on the other hand, allocate 1 byte for each.
Ekin Koc
@Ekin: Nice point
Quassnoi
@Rob: if you are going to search for the results of these bitwise operations, you better split the fields so that the system can combine the indexes and use bitmap access to the table. In Oracle, you also can create persistent bitmap indexes.
Quassnoi
@Quassnoi - In the database there is a mix of fields that are being used for flags that are in the application domain (i.e. don't need to be human readable) and some that are used for yes/no entries that need to be human readable. There maybe one ore two fields that would be queried on, but it is a bit too early to say definitively one way or the other.
Rob
@Rob, who cares if they need to be "human readable", store them the best way possible, and display/format as necessary in the application
KM
@KM - Very true and I tend to follow that line of thinking as well; however, I have met some developers that prefer to keep somethings in the database partly human readable for debugging purposes. Likewise, some also keep the fields human readable as a means of making the output processing easier.
Rob
@Rob: I would steer clear of developers for whom `1` and `0` is not human readable enough for debugging purposes.
Quassnoi
+4  A: 

From my own experience, I prefer char(1) for 'Y' or 'N'. Using 0 and 1 can be a bit confusing depending on how many beers I already drank and C++ main() function return 0 on success. ENUM and BIT types are more trouble than they are worth.

It is interesting to note that MySQL information_schema uses VARCHAR(3) for 'YES' or 'NO'.

Example:

information_schema.`USER_PRIVILEGES (
  ...
  IS_GRANTABLE VARCHAR(3) NOT NULL DEFAULT ''
) 
Yada
The only downside to this is that using a `CHAR` type allows possible values that include 'Y' and 'N', AND '1' and '0'. In your code you (or another developer) might write `if (myTrueFalseFieldFromDatabase.equals("1")) { ... }`, when you mean to compare against `"Y"`. The benefit to using an `INT` or `BIT` type is that only ones and zeros are allowed, so you can't make this mistake.But as long as you're careful this isn't much of a problem.
Tenner
Frankly, I'd discourage using ENUM('Y','N') simply because they both translate into 1/true in other programming languages. If you use 0 and 1, even if it's just ENUM(1,0) instead of BIT/BOOLEAN, then the 0 will translate into false in "soft" languages like Perl, PHP, etc. 0 and 1 are classic ways of denoting boolean values and will be interpreted accurately by most programming languages.
Helgi Hrafn Gunnarsson
@Tenner: an INT can usually store more values than just 0 or 1. If you put a check constraint on the column (assuming that MySQL supports them), then the values permitted can be just 'Y' and 'N' (or '1' and '0'). I'd go with Y/N, but there isn't a major problem with 1/0 instead - in a CHAR(1) field.
Jonathan Leffler
If you must roll your own boolean field, I suggest having a class that converts the language (C++, Java, whatever) boolean to the representation you want, and use it consistently. This avoids problems with someone slipping up and using T/F instead of Y/N in one module, or using lower case instead of upper, or whatever.
Jay
It's almost as if the people on the site haven't heard of constraints. @Jay and @Tenner, Jonathan points out that concerns over confusion for what the proper values should be are unfounded. A check constraint will solve the T/F or Upper/Lower issue with zero ambiguity.
Stephanie Page
@Stephanie: True. I guess that makes this a trivial advantage of using a "translator" class. The real advantage is that it prevents redundant code and encourages consistency. See the edit to my response below.
Jay
+3  A: 

Instead of boolean datatypes, you may want to consider another data model for storing boolean values, which may be particularly appropriate in the following cases:

  • when you will be having many yes/no columns.
  • when you probably require adding more yes/no columns in the future.
  • when the yes/no values do not change very frequently.

Defining user permissions may be a typical example of the above. Consider the following tables:

Table "Users":             (user_id, name, surname, country)

Table "Permissions":       (permission_id, permission_text)

Table "Users_Permissions": (user_id, permission_id)

In the Permissions table you would define all the possible permissions that may be applicable to users. You would have to add a row to the Permissions table for each yes/no attribute. As you may have noticed, this makes it very easy to add new permissions in the future without having to modify the database schema.

With the above model, you would then indicate a TRUE value by assigning a user_id with a permission_id in the Users_Permissions table. Otherwise it would be FALSE by default.

For example:

Table "Permissions"

permission_id   text
-----------------------------------
1               "Read Questions"
2               "Answer Questions"
3               "Edit Questions"
4               "Close Questions"


Table "Users_Permissions"

user_id         permission_id
-----------------------------------
1               1
1               2
1               3
2               1
2               3

Advantages

  • Indexing: you can easily use an index on to query for specific facts.
  • Space: The default convention saves space when you have many false values.
  • Normalized: Facts are defined in their own tables (in the Permissions and Users_Permissions tables.) You can easily store more information on each fact.

Disadvantages

  • Queries: Simple queries would require JOINs.
  • Setting to False: To set a value to false, you would have to delete a row (from the Users_Permissions table.) Otherwise you can use a 'deleted' flag in the Users_Permissions table, which would also allow you to store information for audit trails, such as when a permission was modified and by whom. If you delete the row, you would not be able to store this information.
Daniel Vassallo
A: 

I would typically do this without BIT/BOOLEAN values at all. Instead I'd have three tables. Let's say we have a project management system which has projects, and these projects have a whole bunch of attributes.

Then we have the tables:

Project
 - Project_ID (INT),
 - Name (VARCHAR)

Attribute
 - Attribute_ID (INT),
 - Name (VARCHAR)

ProjectAttribute_Rel
 - Project_ID (INT),
 - Attribute_ID (INT)

Whether a project's attribute is true or false depends on whether there's a line for it in ProjectAttribute_Rel.

Typically, you'd be dealing with Attribute_IDs in your code, so when you read the attributes of the project (where you presumably have the Project_ID), you just do (PHP arbitrarily used as an example):

$arrAttributes = array();
$oQuery = mysql_query('
    SELECT Attribute_ID
    FROM ProjectAttribute_Rel
    WHERE Project_ID = '.addslashes($iProjectId).'
');
while ($rowAttribute = mysql_fetch_assoc($oQuery)) {
    $arrAttributes[] = $rowAttribute['Attribute_ID'];
}

At this point, you can check whether the project's attribute is true by checking whether it exists in $arrAttributes at all. In PHP, this would be:

if (in_array($arrAttributes, $iAttributeId)) {
    // Project attribute is true!
}

This approach also allows you to do all sorts stunts to avoid listing a plethora of attributes when you update, again when you select (because SELECT * is bad in code), when you insert and so forth. This is because you can always loop through the table Attribute to find the available attributes, so if you add one and you do things this way, adding/editing/deleting attributes is trivial. Odds are your SQL won't even need to be changed, because the attributes themselves are defined in database, not in code.

Hope this helps.

Helgi Hrafn Gunnarsson
OMG, more people recommending EAV constructs. Shoot me now.
Stephanie Page
But then instead of just pulling fields out of a record in a result set, we have to loop through the results looking for the fields we want. A mis-spelled field name is difficult to distinguish from a field with a null value. Instead of the schema being, by definition, a definitive list of what attributes apply to any given table, there is no definitive list anywhere. (Someone might write it on a piece of paper, but who's going to insure it's current.) I would think this is a solution to a very specilized category of problems, not something you'd want to do in general.
Jay
I'm sorry if this comes off as somewhat rude, but in my personal opinion, mis-spelled field names are very easily spotted by looking at them. I don't consider this any more of a real problem than the "missing semi-comma" problem.Of course there are SOME benefits to simply hard-coding attributes, but the last time I checked, hard-coding such things was discouraged precisely because then they're written in stone.To get a list of attributes, "SELECT Name FROM Attributes;" - I fail to see the problem.Please remember, geek disagreements always seem more hostile than they are. :)
Helgi Hrafn Gunnarsson
Helgi, what you suggest is so heinous it has its own name. It's called an EAV model. You say "such things are discouraged". You're right, in the programming world... not so much in the database world. Every traditional RDBMS will SUCK EGGS if you design an EAV as you describe. Just go here http://stackoverflow.com/questions/1940327/can-an-attribute-designate-one-table-over-another/1942193#1942193 and read the link from there.
Stephanie Page
I guess we'll just have to disagree then. Perhaps I'm just the greatest programmer that the world has ever seen, but I have never run into any problems with this way of doing things at all. I fail to see the distinction between misspelled field names and a missing semi-comma and I have zero respect for either one of those problems. I don't think it's too much to ask of programmers, and indeed database designers, to be able to read the code in front of them. Sorry if that comes off as harsh or inconsiderate, but that's how I feel about it.
Helgi Hrafn Gunnarsson
Helgi, did you follow the link? It has nothing to do with reading the code, it has nothing to do with misspelled field names, it has everything to do with performance.Read the link. That programmer was also a genius. But the problem is the RDBMS just isn't there to support you. I should say traditional RDBMS.
Stephanie Page
+2  A: 

If your DBMS supports a boolean data type, like MySQL, use it. If it doesn't, like Oracle, I typically use a char(1) with values of Y or N. In the latter case, it's a good idea to write a couple of functions to convert your Java or C++ or whatever boolean type to and from Y/N so you avoid having redundant code to do this. It's a pretty trivial function, but it will have to deal with cases like nulls or values other than Y or N and you want to do that consistently.

I would definitely NOT pack flags into a single variable with bit operations. Yes, this will save some disk space, but the price is much greater complexity and opportunities for error. If your DBMS doesn't support bit operations -- and as I've never had a desire to do such a thing, I don't know off the top of my head which, if any, do -- then you'll have a real hard time selecting or sorting based on such a flag. Sure, you could retrieve all records meeting other criteria and then have the calling code weed out the ones with the proper flag value. But what if only a small percentage of the records have the desired flag value, and you have a query that joins many other records? Like "select employee.name, sum(pay.amount) from employee join pay using (employee_id) where employee.executive=true and pay.bonus=true". With the where clause, you probably retrieve a very modest number of records. Without it, you retrieve the entire database.

Disk space is cheap these days, so any disk savings would likely be unimportant. If you really have a huge number of flags -- like hundreds or thousands of them per record -- I suppose there might be a case for packing them. But that would be way down my list of design choices.

Edit: Let me elaborate on writing a class to convert your "SQL boolean" to a "Java boolean". The same applies to any language, but I'll use Java as the example.

If your DBMS has a built-in boolean type, then with Java you can read this directly into a boolean variable with ResultSet.getBoolean().

But if you have to store it as, say, a char "Y" or "N", then you have to read it into a string. So it makes sense to me to declare a class like this:

class MyBoolean
{
  boolean value;
  final static MyBoolean TRUE=new MyBoolean(true), FALSE=new MyBoolean(false);
  public MyBoolean(boolean b)
  {
    value=b;
  }
  public MyBoolean(String s)
  {
    if (s==null)
      return null;
    else if (s.equals("Y"))
      return MyBoolean.TRUE;
    else
      return MyBoolean.FALSE;
  }
  public static String toString(MyBoolean b)
  {
    if (b==null)
      return null;
    else if (b.value)
      return "Y";
    else
      reutrn "N";
  }
  public String toString()
  {
    return toString(this);
  }
}

Then you can easily pick up booleans from the database with "MyBoolean flag=new MyBoolean(rs.getString("flag"));" and write to the database with "rs.setString("flag", flag.toString());"

And of course you can then add whatever other logic you need to the class if you have other boolean stuff you need to do. If for some purposes you want to display booleans as T/F or Yes/No or On/Off or whatever, you can just add alternate toString variants -- toTFString or toString(value,truetext,falsetext) or whatever -- rather than writing similar code over and over.

Jay
+2  A: 

I think "Y/N" values are more meaningful than "1/0". With Oracle, I would do the following to have the data validated as much as possible by the database engine:

  • define the columns as char(1)
  • add a check constraint that possible values are restricted to "in ('Y', 'N')
  • if consistent with business rules, make them not nullable - this can prevent problems when you implicitly assume that anything that is not 'Y' has a value of 'N' in your SQL
dpbradley
+1  A: 

Instead of adding a column, I suggest you create another table. Hear me out...

Suppose you have a table named Customer:

CREATE TABLE Customer
(
  CustomerID NUMBER,
  Name       VARCHAR(100)
)

Now, suppose you want to indicate whether or not a customer is allowed to show up in search results. One option is to add some column that represents one of two possible states:

CREATE TABLE Customer
(
  CustomerID NUMBER,
  Name       VARCHAR(100),
  Searchable BOOLEAN /* or CHAR(1) or BIT... */
)

Your search query will look something like this:

SELECT CustomerID, Name
  FROM Customer
 WHERE Name LIKE '%TheCustomerNameIAmLookingFor%'
   AND Searchable = TRUE /* or 'Y' or 0... */

This is nice and simple. Many people on this thread are giving good advice for picking which data type this column should be in order to make the syntax play nicely across various databases.

Alternative: Creating a separate table

Instead of adding another column to Customer, I will create a separate table that stores the CustomerID of every searchable customer.

CREATE TABLE Customer
(
  CustomerID NUMBER,
  Name       VARCHAR(100)
)

CREATE TABLE SearchableCustomer
(
  CustomerID NUMBER
)

In this case, a customer is considered searchable if their CustomerID exists in the SearchableCustomer table. The query for searching for customers now becomes:

SELECT CustomerID, Name
  FROM Customer
 WHERE Name LIKE '%TheCustomerNameIAmLookingFor%'
   AND CustomerID IN (SELECT CustomerID FROM SearchableCustomer)

You will see that this strategy is very transferable across RDBMSs:

  • Finding searchable customers uses the IN clause or a JOIN
  • Making a customer searchable uses the INSERT statement
  • Making a customer non-searchable uses the DELETE statement

A Surprise Benefit

You are free to make the definition of a searchable customer as elaborate as you want if you make SearchableCustomer a view instead of a table:

CREATE VIEW SearchableCustomer AS
SELECT CustomerID
  FROM Customer
 WHERE Name LIKE 'S%' /* For some reason, management only cares about customers whose name starts with 'S' */

Your search query doesn't change at all! :D In my experience, this has led to tremendous flexibility.

Adam Paynter
It's because of stuff like this that optimizers have to work so hard:CustomerID IN (SELECT CustomerID FROM SearchableCustomer)
Stephanie Page
Hmm, it's not real clear to me why "where customerid in (select customerid from searchable)" is something I'd prefer to write then "where searchable=true". And if I've got 3 such fields, then instead of 1 table I have 4 (the "master" plus 3 "flag tables"). If you're worried about cross-DBMS portability of booleans, okay, just use a char(1) instead of declaring it boolean.
Jay
@Jay: Fair enough. I tend to be a bit backwards in some regards. :) I tend toward set operations in queries, so it seemed sort of natural to me. For the record, this strategy didn't grow out of a need for cross-platform boolean data types. It actually grew out of a need for a way to centralize the definition of a "searchable" customer (see the "Surprise Benefit"). There were several places in our system that would require refactoring if our definition turned into a function rather than a column value.
Adam Paynter
@Stephanie: I love modern optimizers! :D I get to write queries in set notation!
Adam Paynter
+1 This approach can be very helpful, in cases where we need to know additional info about SearchableCustomers, but not about Customers; otherwise, we'd have to have extra NULLable columns in the Customer table, plus constraints to ensure they are NULL when they're not a SearchableCustomer, and ensure they are NOT NULL when they are.On the other hand, it can get a performance hit because of the extra block reads necessary for the table join.
Jeffrey Kemp
This is definitely an interesting way of doing things that I will have to keep in mind. Is it safe to guess that the use of this technique is a bit context relevant? I'm guessing you wouldn't want to do this if you aren't going to be searching on the field?
Rob
@Adam, you can write all kinds of nonsense queries that optimizers can work with... whether or not you've provided them with all the structures they need to be completely optimal is another matter.If you have a unique constraint on CustomerID in SearchableCustomer then it won't matter that you wrote an IN that should have been a JOIN.
Stephanie Page
@Stephanie: Ohhhhh, I think I'm starting to follow you. I agree, putting a unique constraint would allow me to use a `JOIN` without worrying about duplicates. I opted for the `IN` clause because it conveyed my thoughts better (however backwards those thoughts may be). :)
Adam Paynter
A: 

"SELECT * FROM mytable WHERE col1

, which is impossible in SQL Server and Oracle (you need to have some kind or a predicate there)."

Which goes only to show what a ludicrous and laughable abomination Oracle and SQL server really are.

If col1 is declared to be of type BOOLEAN, then the expression "col1" IS a predicate.

If the semantics of the WHERE clause require that its expression just evaluates to a truth value, and if some column is declared to be of the type "truth value", then "WHERE that-column" should be both allowed and supported. Period. Any system that doesn't just exposes its authors for the incompetent mediocre quacks that they are.

Erwin Smout
wow, ludicrous and an abomination because you have to type a whole *4 extra characters* to add a predicate after a column name? ok...
Jeffrey Kemp
methinks someone has an axe to grind. First you can't define a Boolean column in Oracle. So of course there's no WHERE boolcol. It doesn't exist. Erwin, give us your choice in RDBMS, I'll pick out 100 features that Oracle has that your fav doesn't.
Stephanie Page