views:

704

answers:

2

Hi, I'm hoping someone can provide some advise for an easier way to deal with this problem. I am working on creating a flattened view of a highly normalized set of data. The goal of flattening is to provide a view which normal people can use to develop reports from. The source data contains a couple of tables as shown:

CREATE TABLE VARIABLES  ( 
    VARIABLE_ID INT NOT NULL IDENTITY, 

    VARIABLE_NAME VARCHAR(100) NOT NULL, 

    VARIABLE_DATATYPE VARCHAR(100) NOT NULL, 

    PRIMARY KEY (VARIABLE_ID), 

    UNIQUE (VARIABLE_NAME,VARIABLE_DATATYPE) 
)

CREATE TABLE RECORD_VALUES (
    RUN_ID INT NOT NULL REFERENCES RUNS (RUN_ID) ON DELETE CASCADE, 

    VARIABLE_ID INT NOT NULL REFERENCES VARIABLES(VARIABLE_ID) ON DELETE CASCADE, 

    RECORD_ID VARCHAR(100) NOT NULL, 

    VARIABLE_VALUE VARCHAR(1000), 

    PRIMARY KEY (RUN_ID,VARIABLE_ID,RECORD_ID)
)

The variable_id in the record values table corresponds to one of the variables in the original input stream, say an address or an account balance. For an input record that has 12 variables in it, there will be twelve rows in the record values table.

The inputs to the original process include records of varying widths and variable names. These are split out into name/value tuples in the record_values table. I am writing a procedure to reassemble the variables back into a record that looks like

run_id
record_id (which is actually an underlying account number)
variable_value_1
variable_value_2
...
variable_value_n

My current approach is to dynamically build the table by finding the unique variables for the given set of runs (details not important here), and then building a SQL string that will create the table.

My challenge is how to efficiently load this resulting work table from the original data. Since the names and number of the variables varies with run_id, the only way I can think of to approach this is through something like:

create a cursor for the list of variables
for each variable in the list
   create a cursor to find all the record values for that variable
   for each record value
       update the appropriate record/column in the work table
   end
end

This is going to run forever, as the parent tables have 100's of millions of rows.

Does anyone have an idea about how to generate an approach that I can use to do one update per destination row?

Lest anyone jump on the design of the original tables - there were business reasons to do it this way. I don't like it, but there were good reasons for doing it.

Thanks for any thoughts you can provide.

Andrew

+2  A: 

In Oracle 10g and above:

SELECT  DISTINCT run_id, record_id, val1, val2, ..., val12
FROM    record_values
MODEL
IGNORE NAV
PARTITION BY
        (run_id, record_id)
DIMENSION BY
        (variable_id)
MEASURES
        (val, 0 AS val1, 0 AS val2, ..., 0 AS val12)
RULES UPDATE
        (
        val1[ANY] = val[1], /* Put real variable ID's in the square brackets */
        val2[ANY] = val[2],
        ...,
        val12[ANY] = val[12]
        )
Quassnoi
Tried this, it doesn't look like we have the OLAP components installed.
It's a part of every Oracle 10g installation (including free XE). Are you sure you run a 10g?
Quassnoi
A: 

Thank you very much. It worked. Please edit your post so that val1[ANY] = val**1**[1] and so on