tags:

views:

78

answers:

3

I'm looking for a little advice on how to setup a database to hold numeric data for a modeling application. My users have a spreadsheet that holds data for use in a modeling application. The data is structured as such: Each tab is a line of business, columns are years and rows are elements. The cells are standard decimal numbers xx.xx etc.

     2005 2006  2007  2008  2009  2010  2011 2012
data1   2.5   3.5  
data2    
data3   

The second application can receive data from either an ole link from excel or odbc query. I'd like to put the data into a database (sql server or oracle) but I am unsure on how to structure the tables balancing how the data gets into the database, user interactions and then output queries to the second application. The line of business, elements and years are all not fixed.

I know it is basically a pivot table so the answer I am looking at is a table with {line, element, year, value}. Given the problems associated with getting data in and out with this format would I be better off with a table of {line, element, year1..yearx} with some arbitrary number of columns for future years? This isn't a classic case of entity attribute value but somewhat similar. The elements don't change very often but there are a large number of them 300+. I could group them into separate tables and use a structure like {line, year, element1..elementX} This would probably be the simplest to develop, but doesn't seem "right".

The output queries would typically be kept to a single data element with line, year, and values passed to the second application via odbc.

+2  A: 

This isn't EAV, it's a different antipattern that I call Metadata Tribbles. That is, they seem friendly and convenient, but they tend to multiply out of control.

Define a second table with the year as one column, the numeric data value as one additional column.

Don't try to write a query for all element values for a given line on the same result row. Instead, use a query that returns multiple rows, and write some application code to iterate over them to collect all the values you need.

Bill Karwin
Or use a pivot table. They aren't hard to do in sql server.
Byron Whitlock
+1  A: 

Maybe something like this:

lineofbusiness = (id, name)
elements = (lineofbusinessid, year, value)

So the elements table might look like this:

     lineofbusinessid  year  value
       1               2009   2.3
       1               2010   4.0
       1               2011   1.0
       2               2009   9.0
  

etc.

Vincent Ramdhanie
A: 

Since you're planning on using a relational database management system, I'd store this data in a set of normalized tables. A first stab comes up with (this is all based on SQL Server 2005, and sorry, ):

CREATE TABLE MyData
 (
   LineOfBusiness  varchar(50)  not null
  ,Year            smallint     not null
  ,Element         varchar(50)  not null
  ,Value           float        not null
  ,constraint PK_MyData
    primary key (LineOfBusiness, Year, Element)
 )

Having two varchar(50)s in your primary key could be deemed inefficient, particularly if you end up with a lot of data. (a) I wouldn't sweat it until you hit 64k, but (b) by the time you hit megabytes of data, it'll be far too late to go back and revise your architecture--so might as well get it right the first time.

It could be efficient to move LineOfBusiness to a lookup table:

CREATE TABLE LineOfBusiness
 (
   LineOfBusinessId  int          not null
    constraint PK_LineOfBusiness
     primary key
  ,Description       varchar(50)  not null
 )

If "elements" can be repeated between lines of business, it definitely is more efficient to move it to a lookup table:

CREATE TABLE Element
 (
   ElementId    int          not null
    constraint PK_Element
     primary key
  ,Description  varchar(50)  not null
 )

Year is a simple numeric value falling between 1900 and 2100 (and if not, then huh?), so there's no need to normalize it out. Whether or not a lookup table for year is useful depends on the application requirements. (Maybe having FirstYear and LastYear columns in LineOfBusiness makes sense?)

Based on the above two tables and working in relational integrity, you'd end up with

CREATE TABLE MyData
 (
   LineOfBusinessId  int       not null
    constraint FK_MyData__LineOfBusiness
      foreign key references LineOfBusiness (LineOfBusinessId)
  ,Year              smallint  not null
  ,ElementId         int       not null
    constraint FK_MyData__Element
      foreign key references Element (ElementId)
  ,Value             float     not null
  ,constraint PK_MyData
    primary key (LineOfBusinessId, Year, ElementId)
 )

This leaves many questions open regarding how to load data and ensure/preserver validity, and of course queries (and probably pivot queries) will need to be written, but you can spin wheels and get nowhere if your initial storage designs are inadequate.

Philip Kelley