views:

293

answers:

7

Should this be represented in the database as 1 table or 3 tables? I and my friend have different opinions about this so I'd like to see the general views on this. (Maybe it should be a vote for either solution?)

Create Table Order
// Basic fields of the table
 - ID (Primary key)
 - CustomerID  (integer, with a FK)
 - Quantity
 - ProductID  (integer, with a FK)

 // Then depending on user selection, either these fields need to be specified 
 // (could be factored out to a separate table):
 {
 - InternalAccountID (integer, with a FK)
 - InternalCompanyID (integer, with a FK)
 }

 // Or these (could be factored out to a separate table):
 {
 - ExternalAccountNumber (free text string)
 - ExternalCompanyName (free text string)
 - ExtraInformation (free text string)
 }

1 table approach:

Pros:

  • performance (one insert as opposed to two, FK check, no joins)
  • probably takes less space (the extra tables have overhead + indexes + extra ID field)
  • one table as opposed to three
  • hardly justifiable to have split out to new tables just for 2+3 fields (or what?)

Cons:

  • Nullable fields
  • Potentially extra "type" column (can be skipped)
  • Breaks 3NF (?)

Pros and cons kindly requested as well as personal opinions. :)

EDIT: I tried simplifying the example by using different entities than I am actually using so any suggestions to altering the model wouldn't really help me. I.e. focus on the technical aspects more than the domain model please.

+2  A: 

My opinion would be that if

 // Then depending on user selection, either these fields need to be specified 
 // (could be factored out to a separate table):
 {
 - InternalAccountID (integer, with a FK)
 - InternalCompanyID (integer, with a FK)
 }

 // Or these (could be factored out to a separate table):
 {
 - ExternalAccountNumber (free text string)
 - ExternalCompanyName (free text string)
 - ExtraInformation (free text string)
 }

are always 1:1 with an order (i.e., you can't have 3 accountIDs), then leave it as one table. To take care of your null issue, you could add one more column called InternalCustomer (boolean) or CustomerType (varChar) that you could use to define an internal or external customer to know which of the two sets of fields you should look at for a specific customer.

Since we don't know the full use of this data or the schema for the entire DB, any response on this can't really be fully qualified.

Tommy
yes, it's a pure 1:1 relationship and this extra data only relates to the order itself (no other objects need this or this kind of data)
AtliB
A: 

If you want to avoid data duplication, you should go with a 2- or 3-table solution. For example, if you have the External columns in the Order table, value could exist multiple times. If the data looks like this:

ID   ExternalCompanyName
1    ACME
2    ACME
3    My Company
4    ACME

Now, if ACME changes names to ACME, Inc. you must update many rows. If the tables are normalized, where external companies are in a separate table, you would update one row. Note, there may be an argument to put Account Number in it's own table, but we'll leave that for extreme normalization.

It doesn't appear to be a 1-to-1 relationship between an order and a company/account, unless each company/account can only have one order. it sounds more like a 1-to-many relationship.

Now, what happens if a mistake is made when updating the ExternalCompanyName in a single-table environment, and only some of the rows get updated. You have some rows with ACME and some rows with ACME, Inc. You end up with a bad-data situation.

Also, if this is really a 1-to-many relationship, you really aren't saving space. You are duplicating data in the order, rather than storing it once in another table.

bobs
Thanks, but the purpose of the External** columns is actually to have ad-hoc information that should not change. If I wanted to reuse some data, I would set up an internal account and use those instead.
AtliB
A: 

As the volume increases selection from two tables may be a lot faster than one. Sometimes this kind of refactoring (partition) is done on mature databases to increase performance.

Imagine using this for a multi table join, where some criteria are on this table, but others are in different tables.

select from order join customer using (customer_id)
where
    order.order_date between ? and ?
    and customer.name = ?

It may end up fetching all order rows for the dates from disk, then throwing many of them away because they don't match the join. This fetch from disk is bound to be slow and may to spoil your RAM cache.

select from order join order_detail using (order_id) join customer using (customer_id)
where
    order.order_date between ? and ?
    and customer.name = ?

In this case when it loads all order rows from disk it's not going to hurt as bad as previously, because the table is narrower and smaller. It doesn't need to load all the lengthy fields which are irrelevant for filtering. Eventually, after join to customer, it will only fetch those order_detail rows which match all criteria.

If you expect this to be large, you should consider splitting the table so that the fields which are most critical for searches are in one table, and "data" fields in other one-to-one table(s).

The bottom line is: Normal form and domain is one thing, but performance often requires tradeoffs. You can hide some of them (cover the split with a view), but not all (duplicate/aggregate fields for the sake of faster selects).

Konrad Garus
In my sample, I would actually always filter on the base-fields - never from the "side-tables". So searching fields would most probably be more expansive with 2 or 3 tables than 1.
AtliB
Right, so that's why 2-3 tables is likely to be faster.
Konrad Garus
Well, normally you would be going through an index (which is a lot slimmer than the table itself). From the index you would get the "row locator" and get the entry directly from the single table. If there were more tables, that would be another fetch thus slowing things down. I can't really imaging what might make it faster with multiple tables (assuming properly indexed tables).
AtliB
Did you read my example? That's right, index helps you locate the rows. But if you have more complex queries (using other fields, or joins, which don't use indexes), it may fetch more rows from disk than necessary for the additional criteria. In this case the narrower the table the better.
Konrad Garus
Sure did, and I said that "I would actually always filter on the base-fields" which are pretty selective. But I agree that if you filter by a non-indexed column (and are therefore forced to take a table scan hit), then sure it would probably be better.
AtliB
Even when you filter on an indexed column, planner may choose not to do an index scan. Especially if you filter on a few indexed columns at a time, it may choose to fetch all rows for one index and then scan them for additional criteria. One situation when this often happens is when you do an `OR` or `IN` with many alternatives, or `JOIN`.
Konrad Garus
+2  A: 

alt text

Damir Sudarevic
Thanks for the diagram! Yes, this is how you could set up the logical setup for the tables (as well as physical sure). But when realizing this design, should order+internalorder+externalorder be 1 or 3 tables and why? In principle there isn't actually an inheritance relationship between these entities (at least not from a real-world perspective)
AtliB
Over here it's shown as three, Order table having all common columns, and Internal and External Orders being some special sub-type of an order. You could probably move everything to one table by refactoring your Company and Accounts tables to hold internal and external companies.
Damir Sudarevic
A: 

I would absolutely not go with the 3-table solution. By breaking this data into 3 tables, you really can't have any queries return a full order header without joining with the foreign key, and every insert of a new order updates multiple tables and indexes, which is a problem for concurrency. I would suggest using 2 tables, one for InternalOrders and one for ExternalOrders. For those cases where you need a consolidated query of data from both sets of orders, define a view that is the union of both tables.

I am surprised to see product id and quantity as part of the order header. Every order tracking database I've ever seen breaks out order items as a separate table, using the order id as a foreign key, so that a single order can include multiple products (or the same product with various quantities, delivery times, etc.).

Paul McGuire
Regarding you last point: I tried simplifying the example by using different entities than I am actually using so any suggestions to altering the model wouldn't really help me.
AtliB
There are no problems in concurrency updating multiple tables at once, since there are database transactions, don't forget them.
Shaman
@Shaman - Yes, I'm familiar with the concepts of transactions. The concurrency issue is that, for the duration of my transaction, I may be blocking other innocent bystanders who aren't really in my way, but get held up because I'm updating some index block that we happen to share. The more tables and indexes I update within my transaction, the more likely I am to get in someone else's way. *That* is what I mean by "concurrency problem."
Paul McGuire
A: 

I'm no purist, so 3nf is good when it makes sense...but you don't have to take it for granted that it always will.

From a pragmatic standpoint, what is your goal? Your pros/cons list is a good start. I'd add a few more ideas to the list -- as you see fit.

1) Will any other table in your database need to relate (e.g., join) to this data? That's kind of the point of an RDB.

2) Will your database grow? Even if 1 table makes sense now, will it always make sense? You'll regret it, if you find yourself wanting to add more tables, and your non-normalized table is forcing you to 'work-around' it, dealing with extra rows returned, slower execution times, etc.

3) What happens when your customer gets a new external account, or what have you. Will you create a brand-new record? How will you answer questions like "What is customer so-and-so's account number?".

...

I think in general, i go for scalable, which in this case may mean 3nf. 1 table is easier to deal with in a very narrow scope, but if anything ever changes, you'll be dealing with "How do I split this table into properly related 3nf tables, without messing up all of the dependencies that have been created on it?". That one is no fun.

dave
1) no, no other table will need to related to this table / these tables. 2) yes, but not significantly (maybe one million rows in a few years). 3) The external accounts are thought of as "ad-hoc" accounts so they never need to be used again. If you want a permanent one, you would set up an internal account.
AtliB
OK -- If I understand what I've been reading so far (here and in other answers), you would do well with 4 tables -- here's why: 1) Everyone needs a customer ID (both internal and external) so they can be associated with an order. So a "customer" table is required. 2) As the purpose is to track orders, so a "orders" table is requored. 3) Presumably, your "internal" customers will re-use their accounts for future orders, so you'll need to be searching that table a lot when they log-in, or however you set it up. So an "internal accounts" table will be important. That leaves "external"...
dave
...accounts. I have some question as to why you'd need to go through the trouble of creating external account numbers. (what will they be used for -- will you search for them in the future?) External company name can go in the customer table, as well as extra info. Anyway, I'd either put external customer info in the customer table entirely, and just implicitly "know" they're external because they don't have an internal account associated with them. If you must have "account" infofor your ext. customers, an "external accounts" table would lighten the load for searches on the other tables.
dave
A: 

Is account information associated with the customer before he can order (i.e. you have enother table where you track which account ID-s given CustomerID can use)? Can you abstract all accounts to a reasonably uniform schema (that one can have a few nulls) fo that you have one universal AccountId (surrogate key) and then Account's table has say 3 varchar fields and one that tracks the kind of the account (used for billing etc.) ?

If you can do that then your order tracks just one AccountID since the order (as an entity) really doesn't care which payment method was used - it only cares that it's a legit/existing/approved AccountId for that user. Everything else is someone else's business so to speak (billing or checking funds etc.) and that enity and it's processing will need more data anyway.

This keeps your Order clean and null-free and facilitates separation of concerns as well.

Conceptually, your Order is really so called fact table - carrying only numbers and FK-s, small in item size but with a huge number of them.

So:

 Table Order (
     - OrderId
     - Quantity
     - ProductId
     - DiscountId -- sonner or latter :-)
     - AccountId
     - PaymentStatus -- probaly FK as well or predefined constant
 )

 Table Account (
     - AccountId
     - BillingInfo  -- akka ext acct number as text
     - PrincialName -- akka ext company name, some equivalent for internal acct-s
     - AdditionalData
 )
ZXX