views:

49

answers:

5

I have a time-series simulation model which has more than 10 input variables. The number of distinct simulation instances would be more than 1 million, and each simulation instance generates a few output rows every day.

To save the simulation result in a relational database, i designed tables like this.

create table SimulationModel (

simul_id integer primary key,

input0 string/numeric,

input1 string/numeric,

...)

create table SimulationOutput (

dt DateTime primary key,

simul_id integer primary key,

output0 numeric,

...)

To make each model be unique, i'm considering some ways like below.

  1. put an unique constraint on all of the input columns of the SimulationModel table

  2. redesign SimulationModel table to group input columns into 2~3 columns and put an unique constraint on these grouped columns. (By making a long string value like "input0_input1_input2")

  3. forget about the server side constraint and do the job in the application side.

How do you think about these options?

Is there no problem to use an unique index on more than 10 columns with millions rows?

Is there some other way to recommend?

(I may use postgresql)

+1  A: 

If you create two tables for your simulationmodel like this:

Table SimulationModel {
id: integer,
...
}

Table SimulationModelInput {
simulationmodel_id: integer foreign key(SimulationModel.id),
input: string or numeric,
...
}

then you can put a single constrain on (simulationmodel_id, input) but the query to fetch the configuration is more complex.

Keeper
Two table gonna be joined by simul_id. My question is how to make the combination of all the input variables unique.
tk
So every single simulationmodel will be different from the others? I think the simple way is to do it inside your application because if you need to add another input you would have to recreate the database contrains
Keeper
+1  A: 

The unique index is the way to go to make sure each set of inputs is unique.

iDevlop
+1  A: 

Create the unqiue constraint. 10 columns with a couple of million rows isn't really a large table. Also your tests will presumably benefit from having an index to work with - or at least an index wouldn't do any harm.

dportas
+1  A: 

You could create a single "unique" column that is a hash of the other columns and call that the identity.

Will Charczuk
+1  A: 

I would handle the uniqueness constraint in the application, especially if only one process is creating simulations. You could then have one or more non-unique indexes on commonly-filtered input columns for efficient access.

Justin K