views:

46

answers:

3

Our real estate application has a table, Events, which has historically been linked to the Homes table via an Event.homes_id column.

Recently, for the first time, we added an event type which is not connected to a home, but a realtor. The question: is it good practice to now add a realtor_id column to the Events table? Something in me rebels at the idea of having two columns, home_id and realtor_id for every record, one of which will always be null for any given record. My boss says it's efficient and avoids the overhead of creating new tables. What are the rights and wrongs of this situation?

A corollary to the above question: part of our reluctance to create new tables is the fact that we're using CakePHP, and so it becomes harder to have absolute control over multiple linked tables via SQL joins. (Setting Cake's recursive property to maximum reduces the application's speed to a crawl.) Does, and should, working with Cake affect database design considerations? Or are we just working with Cake wrong?

+2  A: 

My boss says it's efficient and avoids the overhead of creating new tables.

This strikes me as iffy. I think you need a different design.

Specifically, I would consider having Events owned by Homes and Realtors. By restructuring this you avoid the issue of one-of-two IDs being meaningful. I would represent Realtors/Homes to Events as has_many and the reverse as multiple belongs_to relationships if you actually need it.

Kalium
+1 If new tables are overhead, I can't imagine what he'd call the hoops you have to jump to in order to figure out whether a given event is associated with a home or a realtor every time you retrieve it. Tweaking the cardinality of your relationships will involve some work to migrate the keys, but should be easy for the rest of the life of the application.
Rob Wilkerson
It's not *too* difficult to just find all from events "where home_id = n" or "where realtor_id = n", right now anyway, but I'm glad that people share my intuitions that this is messy. I'm having trouble visualising how to refactor the relationships still: what other_table_id columns will continue to be needed in the Events table?
thesunneversets
Under the schema I suggested, none.
Kalium
(Sorry for needing to be led by the nose through this a bit...) Am I creating table(s) to cross-reference event_id with realtor_id, and event_id with home_id? Or if I'm missing a really neat way of building connections between my data, I'd like to know about it...
thesunneversets
You'd need two intermediate tables, one for realtors to events and one for homes to events. Just your basic HABTAM tables should cover it.
Kalium
Thanks! That does sound pretty likely to be the "proper" approach.
thesunneversets
+2  A: 

Something in me rebels at the idea of having two columns, home_id and realtor_id for every record, one of which will always be null for any given record. My boss says it's efficient and avoids the overhead of creating new tables. What are the rights and wrongs of this situation?

Well, you're right, it probably is less efficient than optimal. However, adding another column (an INT, no less) that will be null 50% of the time isn't going to affect your overall database efficiency.

OTOH, it's going to take you a bit of effort to restructure your application. By just adding this column, you're effectively putting in a hack.

I think that's acceptable for this situation, although you may not like the aesthetics of it. Hey -- nobody likes hacks. It increases "technical debt". But google around for that term and you'll see that lots of people are saying to embrace technical debt, because it lets you continue moving forward, rather than trying to zero in on the perfect solution (which will escape you, despite best efforts).

It's a business decision -- is the aesthetic of your schema and codebase worth the cost (your hourly rate * # hours to "properly" fix it)? In this situation, I'd say it's probably not.

Travis Leleu
It's true that everything <i>does work</i> right now and maybe I should just be thankful for that! However being a very junior programmer in the grand scheme of things I always want to be sure I'm not committing too many crimes against best practice... always good to run things past the combined wisdom of the internet!
thesunneversets
+1 for technical debt and moving forward philosophies.
Jason McCreary
Building up technical debt without good reason will almost always come back and bite you in the ass later.
Kalium
@thesunneversets yea, one of the things that will come with experience is the ability to figure out when something is sufficiently "elegant". @Kalium of course, "good reason" could simply be "rapid development / i need to progress forward / this is 'good enough'". It's true that technical debt is just that -- a debt that has to be repaid (usually in a rewrite, sub-optimal operations, breakages, etc.)
Travis Leleu
+1  A: 

"we're using CakePHP, and so it becomes harder to have absolute control" - Why?

What do you actually lose by adding another column? Not a lot. Pride, maybe. All applications have compromises somewhere and this is a tiny one.

"(Setting Cake's recursive property to maximum reduces the application's speed to a crawl.)" - Then don't use recursive! Containable behaviour does a much better job and you can get only the data you want from however deep you need to go.

I would add another column, optimise with containable and move onto more important things.

Leo
Wow, Containable looks *incredibly* useful. The things they don't always tell you when you're just starting out with CakePHP...
thesunneversets