views:

62

answers:

1

I am a recent college graduate and a new hire for software development. Things have been a little slow lately so I was given a db task. My db skills are limited to pet projects with Rails and Django. So, I was a little surprised with my latest task.

I have been asked by my manager to subclass Person with a 'Parent' table and add a reference to their custodian in the Person table. This is to facilitate going from Parent to Form when the custodian, not the Parent, is the FormContact.

Here is a simplified, mock structure of a sql-db I am working with. I would have drawn the relationship tables if I had access to Visio.

We have a table 'Person' and we have a table 'Form'. There is a table, 'FormContact', that relates a Person to a Form, not all Persons are related to a Form. There is a relationship table for Person to Person relationships (Employer, Parent, etc.)

I've asked, "Why this couldn't be handled by a query?" Response, Inefficient. (Really!?!)

So, I ask, "Why not have a reference to the Form? That would be more efficient since you wouldn't be querying the FormContacts table with the reference from child/custodian." Response, this would essentially make the Parent is a FormContact. (Fair enough.)

I went ahead an wrote a query to get from non-FormContact Parent to Form, and tested on the production server. The response time was instantaneous. *SOME_VALUE* is the Parent's fk ID.

SELECT FormID 
FROM FormContact 
WHERE FormContact.ContactID 
    IN (SELECT SourceContactID 
        FROM ContactRelationship
        WHERE (ContactRelationship.RelatedContactID = *SOME_VALUE*) 
            AND (ContactRelationship.Relationship = 'Parent'));

If I am right, "This is an unnecessary change." What should I do, defend my position or should I concede to the managers request?

If I am wrong. What is my error? Is there a better solution than the manager's?

+2  A: 

First things first, your query could use some reworking. Rather than subselects, try using a join:

SELECT FormID

FROM FormContact fc

JOIN ContactRelationship cr on cr.SourceContactID = fc.ContactID 
                               and cr.Relationship = 'Parent'

WHERE cr.RelatedContactID = @parent_id

Secondly, the issue you're dealing with is normalization vs. performance. From a purity perspective, yes, your solution is "more correct" (as you aren't duplicating data, which eliminates the possibility for the disparities in the duplicated data causing conflicts and aberrant behavior), but pure normalization is not always the wisest course of action.

Normalization can induce performance penalties, especially in larger data sets. These penalties have to be weighed alongside the benefits from normalization to see which side "wins".

That being said, I can't see how joining the Person table again on the ParentID column (I'm assuming that's what you'd be adding) would provide a performance boost over the join listed above, assuming that the columns in question are properly indexed.

If the query above works for you and you do rigorous performance testing to show that it's valid, take it to your manager and ask for his input. Because you're new and fresh out of college, be very willing to defer to your manager's judgment and wishes on this one. There will be much bigger battles to fight in the future.

Adam Robinson
I like this answer. I can adapt the SQL Query to the actual db and the explanation of normalization v. performance is informative. Also, it addresses the manager issue. Thanks.
sanscore