views:

978

answers:

7

I wish to perform an experiment many different times. After every trial, I am left with a "large" set of output statistics -- let's say, 1000. I would like to store the outputs of my experiments in a table, but what's the best way...?

Option 1

Have a table with 1000 columns. Seems like a bad idea. What if the number of statistics one day exceeds the maximum number of columns?

Option 2

Have a table with three columns. Let's say, ID, StatisticType, and StatisticValue. That way, you can have as many statistics as you want. However, reading a single experiments statistics becomes more complicated. Moreover, what if different statistics are different data types??

Any suggestions?

+6  A: 

Option 2, with ID, TrialID, StatisticID, StatisticValue

With proper indexing, it will perform fairly well (you can use PIVOT to get the values out on columns fairly easily in SQL Server 2005).

When the statistics are different datatypes, the problem becomes more interesting, but in many cases, I just up-size the datatype (sometimes ints just end up in the money field). For other non-compatible types, the best design in my mind is really separate tables for each type, but I've also seen multiple columns or a free-form text column.

Cade Roux
+1  A: 

You can have one table for statistic types, including their datatype and then a separate table for every datatype, e.g., NumericStats, TextStats, DateTimeStats, which all have a foreign key to the StatisticTypes table.

Mark Cidade
A: 

Three columns: ID, Experiment and Value. It's not that complicated to get the result from one experiment, for example: SELECT * FROM table WHERE Experiment = 5;

matli
+1  A: 

If your DBMS offers an XML datatype, you may want to consider it.

Pros:

  • Fetch all output statistics from a trial from one row
  • With the right schema, the number of statistics can differ from trial to trial
  • Most DBMSs with XML compress your data nicely

Cons:

  • Ties your implementation to a particular DBMS
  • Not as easy to query your results

Cheers.

Corbin March
Most, if not all DB's don't do a good job of indexing on individual elements within an XML blob.
ConcernedOfTunbridgeWells
+2  A: 

Columns in relational databases are a good place to store data that is referenced in searches, ordering and other information processing. If you're just going to store a large amount of values, you can use some other format, like XML, and store them all in a single column. XML will give you both readability, maintainability, flexibility and maybe even some searchability (SQL Server 2005+) in this case.

Orlangur
+3  A: 

I second Cody's answer (here), with some additional thoughts and explanation.

The key of the table will be trialID, statisticType. There will be one row for each statistic for each trial, and 1000 rows for each trial. To get the values for a single experiment, select the rows for the specific trialID (as shown by matli.

You could add a "Trial Master" table that has single row for each trial (trialID as key) with relevant information (date, time, comments, person ...) about that particular trial. This will allow grouping and analysis based on trial attributes .. for instance did morning trials perform differently than afternoon trials, or did trials by Tarzan perform differently than trials by Jane?

You might also add a "Stat Master" table that has a row for each statisticType and that contains attributes about the statistic. This could be valuable if the various stats have different attributes, or if you want to group certain stats.

Have fun!

tomjedrz
A: 

It doesn't matter. Since you haven't mentioned what you plan to use the data for, how you store it is pretty much meaningless. You could store it in CSV, and meet your requirements (which were, basically, how do I store 1000 values).

The queries you wish to run against this data, and the domain that you are modeling makes all the difference in the world.

Mark Brackett