views:

92

answers:

6

Just looking for opinions on the following 2 scenarios.

We have a table where we store our outbound sms-messages. Everytime one of our services sends a premium rate message, it stores them in this table... to date, all the important information that needs to be stored has been in the same format.

SMSMessages
----------------------
ID              int PK NOT NULL Identity
Mobile          nvarchar(50) -- the number we're sending to
NetworkID       int FK -> Table containing networks (voda, o2, etc...)
ShortcodeID     int FK -> Table containing our outbound shortcodes
DateSent        DateTime

Now one of the networks has implemented a completely new API that we need to integrate with that requires a bunch more parameters. 1 of these additional parameters is the "Command". Depending on which command we're sending, there are between 4 and 8 additional parameters we are required to send. For simplicities sake, we'll say there's only two commands... "InitialSend" & "AnniversarySend"

Obviously it would quite the horrible DB design to just add all these additional columns to the end of our existing table so... we reckon we've two options.

Option 1.

Create many new tables, specific to each command, linked back to the original table.

SMSMessages_CommandTypes --Contains "InitialSend" & "AnniversarySend" + other commands
--------------------------
CommandTypeID   int PK
Command     nvarchar(50)


SMSMessages_OddBallNetwork
--------------------------
ID              int PK, FK --> SMSMessages.ID
CommandTypeID   int FK ---> SMSMessages_CommandTypes


SMSMessages_OddBallNetwork_InitialSend
--------------------------------------
ID              int PK, FK --> SMSMessages.ID
Param1          nvarchar(50)
Param6          nvarchar(50)
Param9          nvarchar(50)
Param14          nvarchar(50)

SMSMessages_OddBallNetwork_AnniversarySend
--------------------------------------
ID              int PK, FK --> SMSMessages.ID
Param1          nvarchar(50)
Param2          nvarchar(50)
Param7          nvarchar(50)
Param9          nvarchar(50)
Param12          nvarchar(50)

//There are 4 other Command Types as well so 4 More Tables...

The pro's to this one according to our DBA are all purist. Each possible combination is strongly defined. The relationships are clear and it is the best performer.

From my POV, the cons are development time, number of touch points, complex retrieval rules/procedures for messages with different command types, and lack of reusability... a new command on this Mobile Network or another network bringing in this approach requires DB Level Design and Implementation... not just code level.

Option 2.

This option is to try and design one dynamic implementation with fewer, more reusable structures.

SMSMessages_AdditionalParameterTypes
------------------------------------
ParamterTypeID  int PK NOT NULL Identity
ParamterType    nvarchar(50)

/*
This table will contain all known parameters for any messages
CommandName
Param1
Param2
etc..
*/

SMSMessages_AdditionalParameters
--------------------------------
ID              int PK NOT NULL Identity
MessageID       int FK --> SMS Messages
ParamTypeID     int FK --> SMSMessages_AdditionalParameterTypes
Value           nvarchar(255)

So pros and cons on this one.

Cons: You've less obvious visibility as to what params are linked with what messages There's also a small performance issue... N inserts per message instead of just 2

Pros: It's a hell of a lot easier to develop against (imho). You simply get a list of Parameters Names -> Values back for a given messageID

It's also alot more reusable... if the oddball network adds a new command, a new parameter on a command or even if another network comes along and implements a similar "I want more info" API, we don't need any structural changes on our system.

SO... What would you do ?

+1  A: 

I would create the following tables

SMSMessages
SMSMessages_AdditionalParameterTypes(ParamterTypeID, ParamterType, Operator)
SMSMessages_Parameters(MessageID,ParamTypeID,Value)
Pentium10
+1  A: 

Obviously it would quite the horrible DB design to just add all these additional columns to the end of our existing table

Don't think that's horrible design: it will certainly keep your code as simple as it can be. Use a good name for the columns, not "param1" or anything like that.

A foreign key relation for commandtype is a good idea (basically the SQL equivalent of an enum.)

Placing what would normally be columns in a different table (aka "Dynamic Columns") adds a lot of complexity. In practice this is almost never worth it, unless you intend to allow end-users to add dynamic columns.

Most importantly, once you've made a design, write a couple of example queries for common tasks. That usually helps to clarify which complexity is actually worth it.

Andomar
I mean horrible in the sense that your data is going to be very badly normalised... the large majority on data in the original table will not be for the oddball network so all those columns will be null.It also doesn't solve the problem of additional requirements requiring addition database development instead of additional coding development.
Eoin Campbell
Thanks for the feedback btw
Eoin Campbell
+2  A: 

Option 2 wins for me. For the performance issue, you should only have extra inserts for the oddball network, and that one is going to be a problem anyway. For the visibility issue, I think that is a matter of perception. After working with the new system for a while, it will probably become second-nature to see the message parameter requirements.

Ray
Yeah my feeling exactly.
Eoin Campbell
+3  A: 

Why

Why do you need to store this information? Is it reported on? Searched for? Used for summaries and categorical analysis? In real-time? Frequently?

Do Less

If this data is merely for logging, stick it in a text or xml field and forget about it. YAGNI (You Ain't Gonna Need It) seems likely...

Seriously

Without knowing what the data is used for, no one can answer this question, including you.

Yes, a fully-normalized logical database structure is great and provides clarity etc. But is it useful?

Not all data is gold; some is just CYA

Steven A. Lowe
Eoin Campbell
If not the right answer, certainly the right attitude to take when modeling data.
Philip Kelley
@[Eoin Campbell]: text/xml fields are searchable (use a LIKE query); reconstructing the messages should be trivial, and, in the event of an audit, the data can always be dumped and parsed. Seriously consider the level of effort and maintenance complexity being introduced vs the rarity of the requirements for actual use. Then consider what will happen when the next SMS provider has even more different formats...
Steven A. Lowe
+2  A: 

It sounds like a data logging situation that only needs to be "good enough" for CYA purposes. The fully normalized model is good and proper, but it does seem like overkill for what you will be doing. If you just need a few canned queries for if some auditors come visiting, those queries presumably don't need sub-second response times.

I'm upvoting prior entries accordingly.

Philip Kelley
+1  A: 

Personally option 2 make me shudder as I'm sure it does to your dbas. Entity value tables are the worst way to store data if you need to query on it. This is not generally the "Small" performance hit you envision but the "big" performance hit your dbas envision. DBAs are always getting stuck fixing the poorly performing systems people designed this way because it looks more object-oriented and understandable to them. That's why we hate it when peole suggest these types of designs.

Designing database storage for ease of programmers is short-sighted and unprofessional. Databases must be designed for integrity, performance and security. Time to develop against it is a distant, distant fourth. Databases don't refactor as easily as application code, you may be stuck with this design for the next twenty years.

Now the real question is do you need to query on this data (Other than returning the values in an query based on someother criteria)? If you do not need to query or query very infrequently, then just put all the extra data into a varchar(max) field and be done with it.

HLGEM
Thanks for the feed back HLGEM. Completely taken on board. You're bang on with the angle I'm taking. I'm a dev and seeing it from a dev/coding/this project point of view. Appreciate a different opinion on it.
Eoin Campbell