tags:

views:

37

answers:

4

In a recent CODE Magazine article, John Petersen shows how to use bitwise operators in TSQL in order to store a list of attributes in one column of a db table.

Article here.

In his example he's using one integer column to hold how a customer wants to be contacted (email,phone,fax,mail). The query for pulling out customers that want to be contacted by email would look like this:

SELECT C.*   
FROM   dbo.Customers C  
       ,(SELECT 1   AS donotcontact   
               ,2  AS email   
               ,4  AS phone   
               ,8  AS fax   
               ,16 AS mail) AS contacttypes   
WHERE  ( C.contactmethods & contacttypes.email <> 0 )
AND    ( C.contactmethods & contacttypes.donotcontact = 0 )   

Afterwards he shows how to encapsulate this in to a table function.

My questions are these:

1. Is this a good idea? Any drawbacks? What problems might I run in to using this approach of storing attributes versus storing them in two extra tables (Customer_ContactType, ContactType) and doing a join with the Customer table? I guess one problem might be if my attribute list gets too long. If the column is an integer then my attribute list could only be at most 32.

2. What is the performance of doing these bitwise operations in queries as you move in to the tens of thousands of records? I'm guessing that it would not be any more expensive than any other comparison operation.

+2  A: 

If you wish to filter your query based on the value of any of those bit values, then yes this is a very bad idea, and is likely to cause performance problems.

Besides, there simply isn't any need - just use the bit data type.

The reason why using bitwise operators in this way is a bad idea is that SQL server maintains statistics on various columns in order to improve query performance - for example if you have an email column, SQL server can tell you roughly what percentage of values that email column are true and select an appropriate execution plan based on that knowledge.

If however you have flags column, SQL server will have absolutely no idea how many records in a table match flags & 2 (email) - it doesn't maintain these sorts of indexes. Without this sort of information available to it SQL server is far more likely to choose a poor execution plan.

Kragen
Ah, yes, great point about statistics. That thought had not even crossed my mind. Not sure what you mean by "just use the bit data type". How does using the bit data type address this particular problem?
Mike Brady
@Mike - I meant just store each of the flags in a bit column, however I admit I hadn't properly read the article when I passed judgement! I would probably just accept that certain entities would have null columns in the cases where the flags were not applicable to that entity - perhaps I might consider putting the flags into a separate joining table, however there would need to be a lot of bit flags before it was worthwhile.
Kragen
A: 

In opposite order: The best way to know what your performance is going to be is to profile.

This is, most definately, an "It Depends" question. I personally would never store such things as integers. For one thing, as you mention, there's the conversion factor. For another, at some point you or some other DBA, or someone is going to have to type:

Select CustomerName, CustomerAddress, ContactMethods, [etc]
From Customer
Where CustomerId = xxxxx

because some data has become corrupt, or because someone entered the wrong data, or something. Having to do a join and/or a function call just to get at that basic information is way more trouble than it's worth, IMO.

Others, however, will probably point to the diversity of your options, or the ability to store multiple value types (email, vs phone, vs fax, whatever) all in the same column, or some other advantage to this approach. So you would really need to look at the problem you're attempting to solve and determine which approach is the best fit.

AllenG
+2  A: 

And don't forget the maintenance problems using this technique would cause. As it is not standard, all new devs will probably be confused by the code and not know how to adjust it properly. Errors will abound and be hard to find. It is also hard to do reporting type queries from. This sort of trick stuff is almost never a good idea from a maintenance perspective. It might look cool and elegant, but all it really is - is clunky and hard to work with over time.

HLGEM
+1  A: 

One major performance implication is that there will not be a lookup operator for indexes that works in this way. If you said WHERE contact_email=1 there might be an index on that column and the query would use it; if you said WHERE (contact_flags & 1)=1 then it wouldn't.

** One column stores one piece of information only - it's the database way. **

(Didnt see - Kragen's answer also states this point, way before mine)

Kieren Johnstone
Good point. Thanks for the reminder! To be fair, the author of the article did state as a requirement that this is an option when database changes need to be minimized.
Mike Brady