ansaurus

Question

Adding new columns to a column database having billions of rows.

Answer 1

A:

use stored procedures, do an update by 100 of them, add the stored procedure as a job to run every say 30 seconds.

Flakron Bytyqi 2010-07-19 09:22:38

there is no primary key and how will he know which hundered are already updated?

TheVillageIdiot 2010-07-19 09:26:33

UPDATE .... LIMIT 100 - just supposing this is mysql

Flakron Bytyqi 2010-07-19 09:29:38

What do you mean by "do an update by 100 of them"? Is there any way to do selects in batches of 100 records? something like this - - select * from ... range 1 to 100 - select * from ... range 101 to 200 - select * from ... range 201 to 300...

Prashant 2010-07-19 09:44:11

Yeah. There is no primary key. LIMIT 100 would return a random set of rows. I simply want to scan through the table and update the new column values as I scan through it.

Prashant 2010-07-19 09:47:54

There's no reason to wait 30 seconds. It looks like the question is just to save memory, not to preserve the responsiveness of the database during the update.

pascal 2010-07-19 14:41:05

just as an example, nothing else, an update of 100 records can happen in 2 seconds at most (depending on data), if hosted in good hardware

Flakron Bytyqi 2010-07-19 19:16:38

Answer 2

+1 A:

Different DBMS have different SQL dialects, it is useful to specify which you are using in the question.

In SQL Server you could use a Computed Column but this would calculate the result every time you select the data, you could flag it as persisted but it may take a while to make the change. But you can't do that if you are going to remove the old columns.

Alternatively create the new column allowing nulls and then update it in batches

UPDATE TOP (1000) table_name SET new_col1 = old_col1 + col_col2 WHERE new_col1 IS NULL

Again the query is for SQL Server, but there will alternatives for your DBMS.

Also read Mr Hoopers comment about adding an index to the new column to make sure that the performance of the UPDATE doesn't get worse as more data is added. The update is a read and write operation, the index will speed up the reads and slightly delay the writes (maintaining the index), but it should be worthwhile.

Chris Diver 2010-07-19 09:32:44

Answer 3

+1 A:

I think Mr Diver's method would be fine if you also added an index on one of your new columns; otherwise, as the job progresses, it will have to do more and more scanning to find the rows it hasn't already updated. Adding an index will mean it doesn't have to do that. A possible drawback is that the index differentiation will be frightful when the column is created, but I don't think that would be a problem as you only care about NULL or NOT NULL. You could drop the index when the update is complete.

Brian Hooper 2010-07-19 09:39:25

ansaurus

tags:

views:

answers:

Adding new columns to a column database having billions of rows.

related questions