ansaurus

Question

Database: when to split into separate tables?

Answer 1

+1 A:

Generally you want as little redundancy in database design as possible. Go look up Normal Forms, anything below BCNF is usually hard to maintain. Some applications use more redundancy to achieve more performance for reads, but sacrifice clarity and write performance for it, such as data warehouses. Joins might be slow, but they are better than inconsistent data when the same information is stored twice.

I would therefore advise to use the lower one. Assume your sensors are not linked anymore with perfect time-stamps: Suddenly, the first layout suggestion fails badly.

Kdansky 2010-10-05 13:13:50

Thanks. Not sure what you mean though about being linked with perfect timestamps?

Mark 2010-10-05 13:33:11

Well, if you have a single sensor that always, always writes down a value for temperature and one for humidity, you can get by with a single table: (ID, Time, Humidity, Temperature). But as soon as the two do not take the measurements at the exact same time, you have to split it. And since you usually cannot guarantee that your next hardware purchase will behave identically, splitting from the get-go seems more practical.

Kdansky 2010-10-06 08:44:38

That never happens though in either design. It's always one sensor measures one thing. A single reading is just a single reading for a single sensor.

Mark 2010-10-06 16:32:52

Answer 2

A:

I suggest to act according to the normalisation rules.

Depending on your needs, you can choose a no-sql database.

elCapitano 2010-10-05 13:15:39

Answer 3

A:

This is a fairly standard design decision that needs to be made when peforming object-relational mapping.

The first option you present is known as table-per-hierarchy, and the second is table-per-concrete-class. There are other variations, such as mapping abstract classes to their own tables. Some O-R frameworks (e.g. hibernate) provide ready made code to implement some of these patterns.

Using some of these keywords in some searches should give you further information. There are lots of trade-offs to consider. I guess one of the first to think about is how many different types of sensor are you likely to have.

Another thing to consider is reporting. If you are going to writing a lot of reports that query all types of sensors, then a table-per-class would require unions, which may not be desired.

2010-10-05 13:38:14

Answer 4

A:

Tables are usually split up to into logically distinct "things", so you don't have the same "things" twice. For example, you wouldn't want:

[SensorReadings]
Id : PK
UpperLimit : FLOAT
UpperLimitAlertDelay : INT
LowerLimit : FLOAT
LowerLimitAlertDelay : INT
IsAnalog : BOOL
AnalogValue : FLOAT
IsOn : BOOL

Because you're mixing the sensor and it's readings into one row. A sensor is a different thing than its readings:

[Sensors]                      [SensorReadings]
Id                             Id
UpperLimit                     SensorID
UpperLimitAlertDelay           Reading
LowerLimit
LowerLimitAlertDelay
IsAnalog
Manufacturer
SerialNumber
LastInspectionDate
...

One thing i wouldn't so is split up "sensors" into two tables. A sensor is a sensor, that is the thing that it is. Like a customer is a customer, or a song is a song. You would have a table of songs, not a table of classical songs and another table for all the rest. If you do split up sensors into two tables, you can conceivable have two sensors with the same ID. Sensors are unique entities, they should be in the same table, and all have a unique ID.The fact that a sensor is analog or digital is the property of a sensor.

Your question is unique - your sensors can have Readings in different logical formats; some are analog floating point values, other are digital boolean values. You're struggling with how to store the "readings" of a sensor when not all sensor readings fit in the same logical column data type (i.e. float vs bool). It comes down to practicality, and what is best for the system.

You could store all the readings in a floating point number column:

[SensorReadings]
Id  SensorID  Reading
==  ========  =======
 1   3728     120.2
 2   3728     120.3
 3     89         1
 4     89         0
 5   3728     120.2
 6     89         0

But now you have to know to interpret a floating point value of 0,1 as a logical on,off. Is that difficult to do? i don't personally think so. True, it's not making full use of the data types available in the database engine, but i don't really care. You're going to join SensorReadings with Sensors, so you'll have the IsAnalog column available to help you interpret. In other words:

SELECT Id, SensorID, Reading, Sensors.IsAnalog
FROM SensorReadings sr
   INNER JOIN Sensors s ON sr.SensorID = s.SensorID

Will give you a pretty easy to parse results set:

Id  SensorID  Reading  IsAnalog
==  ========  =======  ========
 1   3728     120.2     false
 2   3728     120.3     false
 3     89         1      true
 4     89         0      true
 5   3728     120.2     false
 6     89         0      true

You could even create a helper view (or just a query), that decodes the reading as AnalogReading and DigitalReading:

CREATE VIEW SimpleSensorReadings AS

SELECT Id, SensorID, Reading AS RawReading,
    CASE Sensors.IsAnalog
    WHEN 0 THEN Reading
    ELSE NULL
    END AS AnalogReading,

    CASE Sensors.IsAnalog
    WHEN 1 THEN CAST(Reading AS BOOL)
    ELSE NULL
    END AS DigitalReading,
    Sensors.IsAnalog
FROM SensorReadings sr
   INNER JOIN Sensors s ON sr.SensorID = s.SensorID

This would give you:

[SimpleSensorReadings]
Id  SensorID  RawReading  AnalogReading  DigitalReading  IsAnalog
==  ========  ==========  =============  ==============  ========
 1   3728        120.2         120.2                       true
 2   3728        120.3         120.3                       true
 3     89            1                       true         false
 4     89            0                      false         false
 5   3728        120.2         120.2                       true
 6     89            0                      false         false

It depends who has to deal with the results. i can easily imagine code first checking the "IsAnalog" column, then reading out either the AnalogReading or DigitalReading as appropriate.

You could do what you originally suggested; split them up into multiple tables. But now the problem becomes: How do you access the data? It seems to me, if i were having this system of sensor readings, at some point i'm going to have to do something with them - show them to a user. Now i'd have to jump through hoops to rejoin the data:

SELECT ID, AnalogSensorID AS SensorID, 
   Value AS RawReading, Value AS AnalogReading, 
   true AS IsAnalog
FROM AnalogSensorReadings

UNION ALL 

SELECT ID, SwitchSensorID AS SensorID, 
   CAST(IsOn AS float) AS RawReading, null AS AnalogReading, IsOn AS DigitalReading,
   false AS IsAnalog

giving you

Id  SensorID  RawReading  AnalogReading  DigitalReading  IsAnalog
==  ========  ==========  =============  ==============  ========
 1   3728        120.2         120.2                       true
 2   3728        120.3         120.3                       true
 1     89            1                       true         false
 2     89            0                      false         false
 3   3728        120.2         120.2                       true
 3     89            0                      false         false

Except now the "Id" is also hard to decode, because two different readings can have the same "ID". A reading is a reading, and should be unique.

A compromise you are probably looking for, is what you originally had.

[SensorReadings]
Id  SensorID  AnalogReading  DigitalReading
==  ========  =============  ==============
 1   3728        120.2         
 2   3728        120.3         
 3     89                       true
 4     89                      false
 5   3728        120.2         
 6     89                      false

Yes this leaves you with a lot of (null) values - but the expense of joining tables back together is a practical problem that has to factor into your design decision.

i think of it like the registry in Windows. A key contains a value. You don't really care how that value is stored, as long as you can read it, as the type is logically is. To accomplish that in a database i would use multiple data-type columns, and read them as appropriate.

Ian Boyd 2010-10-05 14:16:33

Answer 5

A:

The question is: From the point of view of your system, are they the same thing? If so, they belong in one table. If not, they belong in two tables.

Usually this is a no-brainer. "Employee" and "Insurance Plan" are two different things. "Employee named Bob" and "Employee named Sally" are two instances of the same thing.

Sometimes it's trickier. Are "Truck" and "Boat" two different things, or are they both just subtypes of "Vehicle"? It depends on the point of view of your system. If you're selling them, they're probably the same thing. You probably don't care that one floats and the other doesn't, you just care how much they cost and how many you have in stock and the like. That is, you keep the same data about them and you use them in the same queries. But if your system manages a fishing fleet and for Boat you care about things like who the crew members are and how much they are paid and how many fish they caught today, and for Truck you care about things like when it will show up at the dock to pick up today's catch and how much you have to pay the trucking company per pound, they are likely two very different things.

Sure signs that they ARE the same thing are:

They have the same data (not the same values of course, but the same fields)
Queries would routinely be applied against both with little or no distinction

If these are not true, they are probably not the same thing.

That is, if you find that for type 1, field A will have a value and field B will always be null, while for type 2, field A will be null and field B will have a value, then they fail the data test.

If you find that if you put them in one table, you will routinely have to add a type check to all queries so you only get the right one, then they fail the data test. If you conclude that if you put them in two separate tables you would have to constantly write queries that do a join or a union on the two tables to pick up both, then they pass the data test.

In your example, you haven't told us what you intend to do with the data, so I can't discuss the query test. Apparently you have different data for the two types of sensors -- a number versus an on/off -- so right away that's a vote for two different things. But then we get back to how this actually matters to your system. If for temperature probes you will produce graphs of temperature over time or monitor whether they are within certain ranges, and for on/off switches you will trigger a process when they go on and stop it when they go off, they would probably be two different things. If in both cases you will produce reports of the value -- whether a number or an on/off -- at any given time, then they could be the same thing.

I'm inclined to think they're probably different, but without knowing more, I can't really say.

Jay 2010-10-05 17:46:45

Answer 6

A:

I suspect it will depend on relationships to other entities that are not shown. If there are a lot of entities that are related to one type of sensor but not the other, then it might make sense to split them out - otherwise, I would be inclined to use the simpler design (ie. the two table approach, rather than the four table approach).

A couple of changes I would suggest:

Split the "UpperLimitAlertDelay" and the "OnTooLongAlertDelay" into separate fields - as I understand it, they are different values, and should therefore (under 1NF) be separate fields.
Add a datetimestamp field to the Reading table.

Mark Bannister 2010-10-06 10:40:11

ansaurus

tags:

views:

answers:

Database: when to split into separate tables?

related questions