views:

94

answers:

3

We have a data collection system that collects measurements from environmental sensors that measure velocity of water flowing through a river or channel. Each measurement generates a fixed number of values (e.g. Date, Time, Temperature, Pressure etc.) plus a list of velocity values.
Originally the sensors supplied three velocity values so I simply stored each value in it's own column of a single table in a FireBird database. Later on sensor were introduced that could output up to nine velocity values so I simple added six more columns. Even though most sensors use less than 9 values, I reckoned it would not be a problem if most of the columns just contained zeroes.
But now I'm facing a new generation that can output anything from 1 to 256 values and I assume it will not be very efficient to add another 247 columns, especially since most of the measurements will still only contain 3 to 9 values.
Since the measurements are collected every 10 minutes, and the database contains all data for 30 to 50 sensors the total amount of data is quite significant after a few years, yet it must be possible to generate overviews/graphs for any random period of time.

So what would be the most efficient way to store the variable list of values ?
Since each record has it's own unique ID, I assume I could just store all velocity values in separate table, each value tagged with it's record ID. I just have the feeling that this would not be very efficient and that it would get very slow after while.

+4  A: 

Databases can handle large amounts of data in a table if you use efficient indexes. So you can use this table structure:

create table measurements (
     id,
     seq integer, -- between 1 and 256
     ts timestamp, -- Timestamp of the measurement
     value decimal(...)
)

Create an index on id, id, seq and ts. That will allow you to search efficiently through the data. If you distrust your database, just insert a few million rows and run a couple of selects to see how well it fares.

For comparison: I have an Oracle database here with 112 million rows and I can select a record by timestamp or ID within 120ms (0.12s)

Aaron Digulla
A: 

You could save serialized data in a text field, for example JSON-encoding the measurements as:

[<velocity-value-1>, <velocity-value-2>, ...]

Then, in your code, deserialize the values after querying.

This should work well if you only filter your queries by the other fields, and not by the saved values. If you do filter by the values, using them in WHERE clauses will be a nightmare.

orip
Don't you think the de-serialisation of the values when I need them to produce a graph or report (which may well contain thousands of measurements) will be a giant performance hog ?
Cees Meijer
Not really - they passed serialized over the wire anyway, and are deserialized by your DB library. The performance problems appear when you want the DB to do something intelligent with the values - as it only passes them to your code you shouldn't have a performance problem.
orip
A: 

I'd go with a second table:

table measurements (Id, DateTime, Temperature, Pressure)
table velocity (Id, MeasurementId, Sequence, Value)

Velocity.MeasurementId references Measurements.Id.
Velocity.Sequence is the index of the velocity value for that measurement (1-256).

Populate these tables with data as close to real-world as possible and test the sql statements to find the best indexes.

Douglas Tosi