views:

274

answers:

11

Let's say you are a GM dba and you have to design around the GM models

Is it better to do this?

  • table_model
    • type {cadillac, saturn, chevrolet}

Or this?

  • table_cadillac_model
  • table_saturn_model
  • table_chevrolet_model

Let's say that the business lines have the same columns for a model and that there are over a million records for each subtype.

EDIT:

  • there is a lot of CRUD
  • there are a lot of very processor intensive reports
  • in either schema, there is a model_detail table that contains 3-5 records for each model and the details for each model differ (you can't add a cadillac detail to a saturn model)
  • the dev team doesn't have any issues with db complexity
  • i'm not really sure that this is a normalization question. even though the structures are the same they might be thought of as different entities.

EDIT:

Reasons for partitioning the structure into multiple tables - business lines may have different business rules regarding parts - addModelDetail() could be different for each business line (even though the data format is the same) - high add/update activity - better performance with partitioned structure instead of single structure (I'm guessing and not sure here)?

I think this is a variation of the EAV problem. When posed as a EAV design, the single table structure generally gets voted as a bad idea. When posed in this manner, the single table strucutre generally gets voted as a good idea. Interesting...

I think the most interesting answer is having two different structures - one for crud and one for reporting. I think I'll try concatenated/flattened view for reporting and multiple tables for crud and see how that works.

A: 

Given the description that you have given us, the answer is either.

In other words you haven't given us enough information to give a decent answer. Please describe what kind of queries you expect to perform on the data.

[Having said that, I think the answer is going to be the first one ;-) As I imaging even though they are different models, the data for each model is probably going to be quite similar.

But this is a complete guess at the moment.]

Edit: Given your updated edit, I'd say the first one definitely. As they have all the same data then they should go into the same table.

Stevo
+9  A: 

Definitely the former example. Do you want to be adding tables to your database whenever you add a new model to your product range?

Paul Mitchell
not a new model, an entire new business line (GM buys Ford). the occurrence would be very infrequent.
mson
i appreciate paul answering the question, but the assumption in the response is clearly wrong. models can readily be added to either schema. only a new business line requires a new table.
mson
why is this the answer with the most upvotes????
mson
@mson: why would you design an artificial limitation into your software? If you need to handle three biz lines, why not design the data model to support 1-n lines rather than hard-coding the three that exist *now*.
Shog9
The answer has the most up votes cause it is the best answer at the moment. If you expect something else perhaps you could either: provide more info on the purpose of the app, expand on the data model or accept the question? Have you already written the app like the latter?
Quibblesome
@Shog9 - because the approach of predicting what all potential future and unasked for needs will be is not good software practice. Why not anticipate spaceships? Why not anticipate boats? ... The examples are a bit exaggerated, but GM has not introduced lines very frequently.
mson
@mson: i'm not necessarily talking about supporting future lines. Presumably, there can be shared logic - and tests - between lines; imposing artificial distinctions between the data access layers needed for each removes your ability to factor out duplicate functionality *now*.
Shog9
A: 

Another thing to consider in defining "better"--will end users be querying this data directly? Highly normalized data is difficult for end-users to work with. Of course this can be overcome with views but it's still something to think about as you're finalizing your design.

I do agree with the other two folks who answered: which form is "better" is subjective and dependent on what you're hoping to achieve. If you're hoping to achieve very quick queries that's one thing. If you're hoping to achieve high programmer productivity--that's a different goal again and possibly conflicts with quick queries.

Onorio Catenacci
don't see anything wrong with this answer
Robert Gould
+1  A: 

If the tables really do have the same columns, then the former is the best way to do it. Even if they had different columns, you'd probably still want to have the common columns be in their own table, and store a type designator.

Dave Markle
+2  A: 

Use the former. Setting up separate tables for the specialisations will complicate your code and doesn't bring any advantages that can't be achieved in other ways. It will also massively simplify your reports.

ConcernedOfTunbridgeWells
A: 

Choice depends on required performance. The best database is normalized database. But there could be performance issues in normalized database then you have to denormalize it. Principle "Normalize first, denormalize for performance" works well.

Oleg
The best database is not a normalized database. If all a database is ever used for is reporting, normalization is the completely wrong strategy.
Robert C. Barth
+1  A: 

You could try having two separate databases.

One is an OLTP (OnLine Transaction Processing) system which should be highly normalized so that the data model is highly correct. Report performance must not be an issue, and you would deal with non-reporting query performance with indexes/denormalization etc. on a case-by-case basis. The data model should try to match up very closely with the conceptual model.

The other is a Reports system which should pull data from the OLTP system periodically, and massage and rearrange that data in a way that makes report-generation easier and more performant. The data model should not try to match up too closely with the conceptual model. You should be able to regenerate all the data in the reporting database at any time from the data currently in the main database.

Justice
A: 

It depends on the datamodel and the use case. If you ever need to report on a query that wants data out of the "models" then the former is preferable because otherwise (with the latter) you'd have to change the query (to include the new table) every time you added a new model.

Oh and by "former" we mean this option:

table_model
* type {cadillac, saturn, chevrolet}
Quibblesome
+1  A: 

I would say the first way looks better.

Are there reasons you would want to do it the second way?

The first way follows normalization better and is closer to how most relational database schema are developed.

The second way seems to be harder to maintain.

Unless there is a really good reason for doing it the second way I would go with the first method.

metanaito
+2  A: 

On data with a lot of writes, (e.g. an OLTP application), it is better to have more, narrower tables (e.g. tables with fewer fields). There will be less lock contention because you're only writing small amounts of data into different tables.

So, based on the criteria you have described, the table structure I would have is:

Vehicle
  VehicleType
  Other common fields

CadillacVehicle
  Fields specific to a Caddy

SaturnVehicle
  Fields specific to a Saturn

For reporting, I'd have an entirely different database on an entirely different server that does not have the normalized structure (e.g. just has CadillacVehicle and SaturnVehicle tables with all of the fields from the Vehicle table duplicated into them).

With proper indexes, even the OLTP database could be performant in your SELECT's, regardless of the fact that there are tens of millions of rows. However, since you mentioned that there are processor-intensive reports, that's why I would have a completely separate reporting database.

One last comment. About the business rules... the data store cares not about the business rules. If the business rules are different between models, that really shouldn't factor into your design decisions about the database schema (other than to help dictate which fields are nullable and their data types).

Robert C. Barth
A: 
Jonathan Leffler