tags:

views:

184

answers:

4

What is the best way to store a large number of data points?

For example temperature values which are measured every minute over lots of locations?

SQL databases with one row per data points doesn't seem very efficient.

+1  A: 

I would like to know why you reckon it to be "not efficient". Probably you need to explain your data model and schema to give a better context of the scenario.

Storing multiple data points into a single row, when they are not related to each other, and should indeed stand on their own, is not a good approach. Meshing together will result in very counter-intuitive and quirky query statements to pull out the correct data points you need for a given scenario.

We have done work in a power station before, collecting data from various systems and metering equipment a wide variety of gas and electrical parameters that need to be monitored and aggregated. They can come in every 3-5 minutes to 30-60 minutes depending on the type of parameters. These naturally results in millions of records per month.

The key is indexing the tables properly so that their physical order is tied to sequence in which the records came in. (Clustered index) New pages and extents are created and filled sequentially by incoming data. This should prevent massive page splits and reshuffling.

icelava
That's a very good point regarding the physical order the data arrives in and the clustered index.
Mitch Wheat
Table partitioning by date/time stamp is another method to spread the load, especially if you need to keep history for an extended period of time.
Ken Gentle
A: 

Thanks, I didn't mean to store them in one row, but thought about other storage mechanisms beside SQL databases. Millions of records per month looks good to me.

Stephan Schmidt
+1  A: 

A table like this may work:

LocationID, Temperature, Timestamp

I don't see why this wouldn't be efficient. This is what databases are for, after all.

Jeremy Cantrell
A: 

The key questiopn may be: how do you need to access them later?

If you need to associate each point with a timestamp and location ID, and later need to retrieve individual measurements based on time/time range and location from multiple clients, an database may indeed be the most efficient at retrieval.

OTOH, if your client will load and process the data of a whole day of one location, storing the data in one file per location and day reduces dependencies and may be easier.

Other concerns is backups and archival, and if your users can/should deal with that themselves.

peterchen
How they are accessed and queried later will be key to influencing what type of non-clustered indexing to configure for the tables.
icelava