views:

226

answers:

7

I've started a new project and they have a very normalized database. everything that can be a lookup is stored as the foreign key to the lookup table. this is normalized and fine, but I end up doing 5 table joins for the simplest queries.

    from va in VehicleActions
    join vat in VehicleActionTypes on va.VehicleActionTypeId equals vat.VehicleActionTypeId
    join ai in ActivityInvolvements on va.VehicleActionId equals ai.VehicleActionId
    join a in Agencies on va.AgencyId equals a.AgencyId
    join vd in VehicleDescriptions on ai.VehicleDescriptionId equals vd.VehicleDescriptionId
    join s in States on vd.LicensePlateStateId equals s.StateId
    where va.CreatedDate > DateTime.Now.AddHours(-DateTime.Now.Hour)
    select new {va.VehicleActionId,a.AgencyCode,vat.Description,vat.Code,
vd.LicensePlateNumber,LPNState = s.Code,va.LatestDateTime,va.CreatedDate}

I'd like to recommend that we denormaize some stuff. like the state code. I don't see the state codes changing in my lifetime. similar story with the 3-letter agency code. these are handed out by the agency of agencies and will never change.

When I approached the DBA with the state code issue and the 5 table joins. i get the response that "we are normalized" and that "joins are fast".

Is there a compelling argument to denormalize? I'd do it for sanity if nothing else.

the same query in T-SQL:

    SELECT VehicleAction.VehicleActionID
      , Agency.AgencyCode AS ActionAgency
      , VehicleActionType.Description
      , VehicleDescription.LicensePlateNumber
      , State.Code AS LPNState
      , VehicleAction.LatestDateTime AS ActionLatestDateTime
      , VehicleAction.CreatedDate
FROM VehicleAction INNER JOIN
     VehicleActionType ON VehicleAction.VehicleActionTypeId = VehicleActionType.VehicleActionTypeId INNER JOIN
     ActivityInvolvement ON VehicleAction.VehicleActionId = ActivityInvolvement.VehicleActionId INNER JOIN
     Agency ON VehicleAction.AgencyId = Agency.AgencyId INNER JOIN
     VehicleDescription ON ActivityInvolvement.VehicleDescriptionId = VehicleDescription.VehicleDescriptionId INNER JOIN
     State ON VehicleDescription.LicensePlateStateId = State.StateId
Where VehicleAction.CreatedDate >= floor(cast(getdate() as float))
+3  A: 

This previous post dealt with a similar issue to the one you're having. Hopefully it will be helpful to you.

http://stackoverflow.com/questions/507671/dealing-with-hypernormalized-data

My own personal take on normalization is to normalize as much as possible, but denormalize only for performance. And evn the denormalization for performance is something to avoid. I'd go the route of profiling,setting correct indexes, etc before I'd denormalize.

Sanity... That's overrated. Especially in our profession.

David Stratton
+1 for the "Sanity" soundbite. Mind if I quote you on occasion? ;-)
sleske
Not at all. Quote away.
David Stratton
+4  A: 

Some denormalization can be needed for performance (and sanity) reasons at some times. Hard to tell wihout seeing all your tables / needs etc...

But why not just build a few convenience views (to do a few joins) and then use these to be able to write simpler queries?

ChristopheD
+1 for the idea of views... Useful, simple suggestion.
David Stratton
The idea of small, simple, reusable functions should apply to all the code we right, when possible. I get a lot of mileage out of table-valued functions and views for things like this. And as a bonus, reporting becomes a lot easier as well.
overslacked
+2  A: 

Beware of wanting to shape things to your current idioms. Right now the unfamiliar code seems unweildy and obstructive to your understanding. In time it's possible that you will become acclimatised.

If current (or known future) requirements, such as performance are not being met then that's a whole different issue. But remember anything can be performance tuned, the objective is not to make things as fast as possible, but to make them fast enough.

djna
+1 for pointing out that developers tend to grow with time. I think in this situation, it's better to learn to deal with the hyper-normalized data and adjust rather than adjust the data to what we're comfortable with.
David Stratton
+5  A: 

I don't know if I would even call what you want to do denormalization -- it looks more like you just want to replace artificial foreign keys (StateId, AgencyId) with natural foreign keys (State Abbreviation, Agency Code). Using varchar fields instead of integer fields will slow down join/query performance, but (a) if you don't even need to join the table most of the time because the natural FK is what you want anyway it's not a big deal and (b) your database would need to be pretty big/have a high load for it to be noticeable.

But djna is correct in that you need a complete understanding of current and future needs before making a change like this. Are you SURE the three letter agency codes will never change, even five years from now? Really, really sure?

pjabbott
I used to be a huge fan of the elegance, logic and clarity of natural foreign keys, but they just aren't worth the constant maintenance hassle. So instead I created elegant tools to manage the artificial keys, and everyone's home in time for dinner.
overslacked
+2  A: 

An argument (for this "normalization") that the three-letter codes might change isn't very compelling without a plan for what you will do if the codes do change, and how your artificial-key scenario will address this eventuality better than using the codes as keys. Unless you've implemented a fully temporal schema (which is horribly difficult to do and not suggested by your example), it's not obvious to me how your normalization benefits you at all. Now if you work with agencies from multiple sources and standards that might have colliding code names, or if "state" might eventually mean a two-letter code for state, province, department, canton, or estado, that's another matter. You then need your own keys or you need a two-column key with more information than that code.

Steve Kass
+3  A: 

Well, what about the performance? If the performance is okay, just make the five table JOIN into a view and, for sanity, SELECT from the view when you need the data.

State abbreviations are one of the cases in which I think meaningful keys are okay. For very simple lookup tables with a limited number of rows and where I'm in complete control of the data (meaning it's not populated from some outside source) I'll sometimes create meaningful four or five character keys so that the key value can proxy for the fully descriptive lookup value in some queries.

Larry Lustig
+3  A: 

Create a view (or inline table-valued function to get parameterization). In any case, I usually put all my code into SPs (some code generated) whether they use views or not and that's that, you pretty much only ever write the join once.

Cade Roux