ansaurus

Question

Database table with just 1 or 2 optional fields... split into multiple tables?

Answer 1

+1 A:

I think the choice of having these fields is not going to hurt you today and would be the choice I would go for. just remember that as your database evolves you may need to make the decision to refactor to 2 separate tables, (if you need more fields)

John Nolan 2009-08-17 22:03:19

Answer 2

A:

If there are two distinct entities, "Personal Sales" and "Company Sales", then perhaps you ought to have two tables to represent those entities?

Scott Ferguson 2009-08-17 22:10:07

That is the dilemma. Both share the same set of fields so having two unrelated tables would be bad... especially since some queries will not distinguish between personal/company.

John 2009-08-17 22:19:04

Answer 3

+2 A:

What you're describing with one table for common columns and dependent tables for subtype-specific columns is called Class Table Inheritance. It's a perfectly good thing to do.

What @Scott Ferguson seems to be describing (two distinct tables for the two types of sales) is called Concrete Table Inheritance. It can also be a good solution depending on your needs, but more often it just makes it harder to write query across both subtypes.

If all you need is one or two columns that apply only to a given subtype, I agree it seems like overkill to create dependent tables. Remember that most brands of SQL database support CHECK constraints or triggers, so you can design data integrity rules into the metadata.

CREATE TABLE Sales (
 sale_id SERIAL,
 is_business INT NOT NULL, -- 1 for corporate, 0 for personal
 sku VARCHAR(20),          -- only for corporate
 paypal_id VARCHAR(20),    -- mandatory but only for personal
 CONSTRAINT CHECK (is_business = 0 AND paypal_id IS NOT NULL)
);

Bill Karwin 2009-08-17 22:21:18

Answer 4

A:

News flash: the DB cannot prevent 100% of corrupt data now matter which way you cut it. So far you have only considered what I call level 1 corruption (level 0 corruption is essentially what would happen if you wrote garbage over your database with a hex editor).

I have yet to see a database that could prevent level 2 corruption (syntactically correct records but when taken as a whole mean something perverse).

Joshua 2009-08-17 22:27:55

Answer 5

+1 A:

There are some who insist that inapplicable fields should never be allowed, but I think this is one of those rules that someone wrote in a book and now we're all supposed to follow it without questioning why. In the case you're describing, a single table sounds like the simple, intelligent solution.

I would certainly not create two tables. Then all the common fields would be duplicated, and all your queries would have to join or union two tables. So the real question is, One table or three. But you seem to realize that.

You didn't clarify what the additional fields are. If the presence or absence of one field implies the record type, then I sometimes use that fact as the record type indicator rather than creating a redundant type. Like, if the only difference between a "personal sale" and a "business sale" is that a business sale has a foreign key for a company filled in, then you could simply state that you define a business sale as one with a company filled in, and no ambiguity is possible. But if the situation gets even slightly more complicated, this can be a trap: I've seen applications that say if a is null and b=c d / 7 = then it's record type A, else if b is null and etc etc. If you can't do it with one test on one field, forget it and put in a record type field.

You can always enforce consistency with code or constraints.

I worry a lot more about redundant data creating consistency problems then inapplicable fields. Redundant data creates all sorts of problems. Data inapplicable to a record type? In the worst case, just ignore it. If it's a "personal sale" and somehow a company got filled in, ignore it or null it out on sight. Problem solved.

Jay 2009-08-18 16:48:08

Answer 6

A:

The PRO for keeping all fields in one table is that you get rid of JOIN's which makes your queries faster.

The CONTRA is that your table grows larger which makes your queries slower.

Which one impacts you more, totally depends on your data distribution and which queries you issue most often.

In general, splitting is better for OLTP systems, joining is better for data analysis (that tends to scan the tables).

Let's imagine 2 scenarios:

Split fields. There are 1,000,000 rows, the average row size is 20 bytes, the split field is filled once per 50 rows (i. e. 20,000 records in the split table).

We want to query like this:
```
SELECT  SUM(mainfield + COALESCE(splitfield, 0))
FROM    maintable
LEFT JOIN
        splittable
ON      splitid = mainid
```
This will require scanning 20,000,000 bytes and nested loops (or hash lookups) to find 10,000 records.

Each hash lookup is roughly equivalent to scanning 10 rows, so the total time will be equivalent of scanning 20,000,000 + 10 * 20,000 * 20 = 24,000,000 bytes
Joined fields. There are 1,000,000 rows, the average row size is 24 bytes, so the query will scan 24,000,000 bytes.

As you can see, the times tie.

However, if either parameter changes (field is filled more often or more rarely, the row size is more or less, etc), one or another solution will become better.

Quassnoi 2009-08-18 17:10:19

I think more important than minor performance differences is maintainability. If you create two tables that have twenty fields in common and three columns different, then you have to write twice as many queries, any change to the schema must be done twice, etc.

Jay 2010-07-13 13:42:08

@Jay: the common fields would be only stored in one table.

Quassnoi 2010-07-13 14:53:17

ansaurus

tags:

views:

answers:

Database table with just 1 or 2 optional fields... split into multiple tables?

related questions