views:

128

answers:

4

I need to select data from one table and insert it into another table. Currently the SQL looks something like this:

   INSERT INTO A (x, y, z)
   SELECT x, y, z
   FROM B b
   WHERE ...

However, the SELECT is huge, resulting in over 2 millions rows and we think it is taking up too much memory. Informix, the db in this case, runs out of virtual memory when the query is run.

How would I go about selecting and inserting a set of rows (say 2000)? Given that I don't think there are any row ids etc.

+1  A: 

You can do SELECT FIRST n * from Table. Where n is the amount of rows you want, say 2000. Also, in the WHERE clause do an embedded select that checks the table you are inserting in to for rows already existing. So that the next time the statement is ran, it will not include already inserted data.

northpole
this sounds good. how would you write a select loop that loops till the select returns no rows?
prmatta
use the NOT EXISTS function to make sure the row you are inserting does not already exist in the new table.
northpole
A: 

I assume that you have some script that this is executed from? You can just loop and limit as long as you order the values returned from the nested select. Here is some pseudo code.

total = SELECT COUNT(x) FROM B WHERE ...
while (total > 0) 
  INSERT INTO A (x, y, z) SELECT x, y, z FROM B b WHERE ... ORDER BY x LIMIT 2000
  total = total - 2000
end
Randy Simon
A: 

This explains its quick and simply old, but should still work for most sql databases.

http://sqlserver2000.databases.aspfaq.com/how-do-i-limit-the-number-of-rows-returned-in-my-resultset.html

Anicho
I know it's your first answer and all but OP is using Informix!
Martin Smith
A: 

I'm almost certain that IDS only lets you use the FIRST clause where the data is returned to the client1, and that is something you want to avoid if at all possible.

You say you get an out of memory error (rather than, say, a long transaction aborted error)? Have you looked at the configuration of your server to ensure it has a reasonable amount of memory?

It depends in part on how big your data set is, and what the constraints are - why you are doing the load across tables. But I would normally aim to determine a way of partitioning the data into loadable subsets and run those sequentially in a loop. For example, if the sequence numbers are between 1 and 10,000,000, I might run the loop ten times, with condition on the sequence number for AND seqnum >= 0 AND seqnum < 1000000' and thenAND seqnum >= 1000000 AND seqnum < 2000000', etc. Preferably in a language with the ability to substitute the range via variables.

This is a bit nuisancy, and you want to err on the conservative side in terms of range size (more smaller partitions rather than fewer bigger ones - to reduce the risk of running out of memory).


1 Over-simplifying slightly. A stored procedure would have to count as 'the client', for example, and the communication cost in a stored procedure is (a lot) less than the cost of going to the genuine client.

Jonathan Leffler