views:

197

answers:

6

We are porting an old application that used a hierarchical database to a relational web app, and are trying to figure out the best way to port configuration switches (Y/N values).

Our old system had 256 distinct switches (per client) that were each stored as a bit in one of 8 32-bit data fields. Each client would typically have ~100 switches set. To read or set a switch, we'd use bitwise arithmetic using a #define value. For example:

if (a_switchbank4 & E_SHOW_SALARY_ON_CHECKS) //If true, print salary on check

We were debating what approach to store switches in our new relational (MS-SQL) database:

  1. Put each switch in its own field
    • Pros: fast and easy read/write/access - 1 row per client
    • Cons: seems kludgey, need to change schema every time we add a switch
  2. Create a row per switch per client
    • Pros: unlimited switches, no schema changes necessary w/ new switches
    • Cons: slightly more arduous to pull data, lose intellisense w/o extra work
  3. Maintain bit fields
    • Pros: same code can be leveraged, smaller XML data transmissions between machines
    • Cons: doesn't make any sense to our developers, hard to debug, too easy to use wrong 'switch bank' field for comparison

I'm leaning towards #1 ... any thoughts?

+4  A: 

It depends on a few factors such as:

  • How many switches are set for each client
  • How many switches are actually used
  • How often switches are added

If I had to guess (and I would be guessing) I'd say what you really want are tags. One table has clients, with a unique ID for each, another has tags (the tag name and a unique ID) and a third has client ID / tag ID pairs, to indicate which clients have which tags.

This differs from your solution #2 in that tags are only present for the clients where that switch is true. In other words, rather than storing a client ID, a switch ID, and a boolean you store just a client ID and a switch ID, but only for the clients with that switch set.

This takes up about one third the space over solution number two, but the real advantage is over solutions one and three: indexing. If you want to find out things like which clients have switches 7, 45, and 130 set but not 86 or 14, you can do them efficiently with a single index on a tag table, but there's no practical way to do them with the other solutions.

MarkusQ
Each client has ~100 switches set at any one time, all 256 switches are currently in use (we're out of room to add more on our old system), and switches are added about every 3 months (but our code changes at the same time, so coordinating a rollout of the two may not be too difficult).
Jess
BTW - I think your solution for 'tags' are the same as our solution #2 (row per switch per client) ... just more eloquent. Let me know if this changes your mind ... thanks!
Jess
@LuckyLindy No, it's not the same as your #2--see my edit.
MarkusQ
Markus - thanks for the revision ... I can see your point and think we'll migrate that way. Thanks!
Jess
A: 

I'd also lean toward option 1, but would also consider an option 4 in some scenarios.

4- Store in dictionary of name value pairs. Serialize to database.

tarn
A: 

I would recommend option 2. It's relatively straightforward to turn a list of tags/rows into a hash in the code, which makes it fairly easy to check variables. Having a table with 256+ columns seems like a nightmare.

One problem with option #2 is that having a crosstab query is a pain:

Client   S1 S2 S3 S4 S5 ...
A            X     X
B         X  X  X

But there are usually methods for doing that in a database-specific way.

Jeffrey Melloy
+2  A: 

You could think about using database views to give you the best of each solution. For example store the data as one row per switch, but use a view that pivots the switches (rows) into columns where this is more convenient.

cindi
Great idea, but we need to be able to support SQL Server Desktop/CE, which means no views.
Jess
+1  A: 

I would go with option #2, one row per flag.

However, I'd also consider a mix of #1 and #2. I don't know your app, but if some switches are related, you could group those into tables where you have multiple columns of switches. You could group them based on use or type. You could, and would probably still have a generic table with one switch per row, for ones that don't fit into the groups.

KM
+1  A: 

Remember too if you change the method, you may have a lot of application code to change that relys on the existing method of storing the data. Whether you should change the method may depend on exactly how hard it will be and how many hours it will take to change everything associated. I agree with Markus' solution, but you do need to consider exactly how hard refactoring is going to be and whether your project can afford the time. The refactoring book I've been reading would suggest that you maintain both for a set time period with triggers to keep them in synch while you then start fixing all the references. Then on a set date you drop the original (and the triggers) from the database. This allows you to usue the new method going forth, but gives the flexibility that nothing will break before you get it fixed, so you can roll out the change before all references are fixed. It requires discipline however as it is easy to not get rid of the legacy code and columns because everything is working and you are afraid not to. If you are in the midst of total redesign where everything will be tested thougroughly and you have the time built into the project, then go ahead and change everything all at once.

HLGEM