tags:

views:

30

answers:

3

Sybase problem: The selected values of column1 (id) of a huge_tbl table (60 million+ records) needs to be copied into small_tbl table which is comparatively small. The criteria to select rows from the huge_tbl depends on column2 values (mydate).

huge_tbl: - id is primary key of huge_table - mydate column is a datetime field. - There is an index (mydate,id) on huge_tbl. - huge_tbl has many more columns apart from id and mydate.

small_tbl: - small_tbl has 2 columns: identity column is an identity column id2 is similar to id of huge_tbl

Since the number of records to be inserted is quite large and to avoid the filling up the log space, the insertion is divided into chunks using set rowcount.

Due to this, next iteration has to start from where the previous iteration completed. Suppose the last record inserted in 1st iteration had mydate value as "02/10/2010 09:00:00". There might be more records with the same mydate values in the huge_tbl which were not selected because of "set rowcount ###". The next iteration has to select records where mydate value starts from "02/10/2010 09:00:00" but should also filter out the records already selected in previous iteration.

I tried few approaches but it either ends with duplicate records in small_tbl or it takes too long to process.

Any pointers to this problem would be appreciated.

Thanks in advance, Tom

For example -The below approach is taking too long:

while (row_count > 0) begin

insert into small_tbl(id2) select id from huge_tbl where mydate between and and 1 = case when ((mydate = @last_max_mydate) and id > @last_max_id)) then 1 when (mydate > @last_max_mydate) then 1 else 0

select @row_count = @@rowcount

select @last_max_identity = max(identity) from small_tbl select @last_max_id2 = id2 from small_tbl where identity = @last_max_identity select @last_max_mydate = mydate from huge_tbl where id = @last_max_id2

end


A: 

create table as select statement does not fill database log. Try this one method instead of rowcount split.

kolchanov
Agreed. But I would not put it that way. Whereas INSERT logs every row, and is recoverable; SELECT INTO (which performs an automatic CREATE TABLE) does not log rows, it logs only Allocations to the table, and is not recoverable. You have to set the SELECT-INTO db_option, which prevents further log dumps. Not acceptable for production systems.
PerformanceDBA
A: 

This problem is easy to solve, and has been done a thousand times.

First, if you do not mind me saying so, you are too deep into your solution, and your table, you do not see the problem for what it is. So please answer my questions without worrying about what they mean or intend.

1 Forget the IDENTITY column on huge_table, or what your PRIMARY KEY constraint is defined to be, what is the true relational primary key ? If you are not sure, post all the explicit and CREATE INDEX statements for huge_table.

2 What is the purpose of copying huge_table to small_table ? Whenever you duplicate data (even temporarily), there is a larger problem that is not being identified; if you resolve that, the need to copy will be eliminated.

PerformanceDBA