ansaurus

Question

Optimizing daily data storage in a relational db

Answer 1

+1 A:

Without resorting to massive indexing, or duplication of data, I think it will be difficult to find a single schema design, which is optimal for both of your queries.

By clustering your data by either date or sensor, retrieval by one of these conditions can be made to run fast, but not both at the same time.

Assuming that access by date is the most important, you could layout your table like below:

CREATE TABLE d (
    day      DATE,
    a       SMALLINT[],
    b       SMALLINT[],
    ...
);

Observe that there is now only one row per day, and that cell fields have become arrays, where each cell will have its own index. In case the cell numbering is not zero-based, a table could be fitted with the mapping from cell ids to array indexes.

Query 1,

Retrieve the value of a single var for all or a portion of the cells for a single day.

is accomplished by, for example,

SELECT a FROM d WHERE day = '1981-01-01'

Query 2,

Retrieve values for all the days or a duration of days for a single var for a single cell.

will be of the form

SELECT a[1000] FROM d WHERE day BETWEEN '1981' AND '1982'

I believe large arrays in PostgreSQL are accessed without actually loading the whole data structure. I know BLOBs are. If that is the case, this solution may be fast enough for you. Otherwise I would suggest making another view of the data, which optimizes access by cell.

Anders Johannsen 2010-07-25 16:00:42

ansaurus

tags:

views:

answers:

Optimizing daily data storage in a relational db

related questions