Hi, I'm hoping someone can provide some advise for an easier way to deal with this problem. I am working on creating a flattened view of a highly normalized set of data. The goal of flattening is to provide a view which normal people can use to develop reports from. The source data contains a couple of tables as shown:
CREATE TABLE VARIABLES (
VARIABLE_ID INT NOT NULL IDENTITY,
VARIABLE_NAME VARCHAR(100) NOT NULL,
VARIABLE_DATATYPE VARCHAR(100) NOT NULL,
PRIMARY KEY (VARIABLE_ID),
UNIQUE (VARIABLE_NAME,VARIABLE_DATATYPE)
)
CREATE TABLE RECORD_VALUES (
RUN_ID INT NOT NULL REFERENCES RUNS (RUN_ID) ON DELETE CASCADE,
VARIABLE_ID INT NOT NULL REFERENCES VARIABLES(VARIABLE_ID) ON DELETE CASCADE,
RECORD_ID VARCHAR(100) NOT NULL,
VARIABLE_VALUE VARCHAR(1000),
PRIMARY KEY (RUN_ID,VARIABLE_ID,RECORD_ID)
)
The variable_id in the record values table corresponds to one of the variables in the original input stream, say an address or an account balance. For an input record that has 12 variables in it, there will be twelve rows in the record values table.
The inputs to the original process include records of varying widths and variable names. These are split out into name/value tuples in the record_values table. I am writing a procedure to reassemble the variables back into a record that looks like
run_id
record_id (which is actually an underlying account number)
variable_value_1
variable_value_2
...
variable_value_n
My current approach is to dynamically build the table by finding the unique variables for the given set of runs (details not important here), and then building a SQL string that will create the table.
My challenge is how to efficiently load this resulting work table from the original data. Since the names and number of the variables varies with run_id, the only way I can think of to approach this is through something like:
create a cursor for the list of variables
for each variable in the list
create a cursor to find all the record values for that variable
for each record value
update the appropriate record/column in the work table
end
end
This is going to run forever, as the parent tables have 100's of millions of rows.
Does anyone have an idea about how to generate an approach that I can use to do one update per destination row?
Lest anyone jump on the design of the original tables - there were business reasons to do it this way. I don't like it, but there were good reasons for doing it.
Thanks for any thoughts you can provide.
Andrew