views:

483

answers:

5

I have a table with ~30,000,000 rows that I need to iterate through, manipulate the data for each row individually, then save the data from the row to file on a local drive.

What is the most efficient way to loop through all the rows in the table using SQL for Oracle? I've been googling but can see no straightforward way of doing this. Please help. Keep in mind I do not know the exact number of rows, only an estimate.

EDIT FOR CLARIFICATION:

We are using Oracle 10g I believe. The row data contains blob data (zipped text files and xml files) that will be read into memory and loaded into a custom object, where it will then be updated/converted using .Net DOM access classes, rezipped, and stored onto a local drive.

I do not have much database experience whatsoever - I planned to use straight SQL statements with ADO.Net + OracleCommands. No performance restrictions really. This is for internal use. I just want to do it the best way possible.

+1  A: 

You could try using rownum queries to grab chunks until you grab chunk that doesn't exist.

This is a good article on rownum queries: http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html

If you don't feel like reading, jump directly to the "Pagination with ROWNUM" section at the end for an example query.

Jon Quarfoth
A: 

It's always preferable to use set-based operations when working with a large number of rows.

You would then enjoy a performance benefit. After processing the data, you should be able to dump the data from the table into a file in one go.

The viability of this depends on the processing you need to perform on the rows, although it is possible in most cases to avoid using a loop. Is there some specific requirement which prevents you from processing all rows at once?

If iterating through the rows is unavoidable, using bulk binding can be beneficial: FORALL bulk operations or BULK COLLECT for "select into" queries.

Jason
A: 

My initial solution was to do something like this, as I have access to an id number (pseudocode):

            int num_rows = 100;
            int base = 0;
            int ceiling = num_rows;

            select * from MY_TABLE where id >= base and id < ceiling;
            iterate through retrieved rows, do work,
            base = ceiling; 
            ceiling += num_rows;
            select * from MY_TABLE where id >= base and id < ceiling;
            iterate through retrieved rows, do work,
            ...etc.

But I feel that this might not be the most efficient or best way to do it...

JimDaniel
A: 

It sounds like you need the entire dataset before you can do any data manipulation since it is a BLOB>. I would just use a DataAdapter.Fill and then hand the dataset over to the custom object to iterate through, do it's manipulation and then write to disk the end object, and then zip.

mcauthorn
Each row's blob is an independent zip file that needs read into memory, updated, then rezipped and saved to local drive. If this is the case, do you still recommend DataAdapter.Fill?
JimDaniel
No, then I would just use a DataReader, read each line in, work on that line and then move onto the next line.
mcauthorn
+2  A: 

You need to read 30m rows from an Oracle DB and write out 30m files from the BLOBs (one zipped XML/text file in one BLOB column per row?) in each row to the file system on the local computer?

The obvious solution is open a ADO.NET DataReader on SELECT * FROM tbl WHERE <range> so you can do batches. Read the BLOB from the reader into your API, do your stuff and write out the file. I would probably try to write the program so that it can run from many computers, each doing their own ranges - your bottleneck is most likely going to be the unzipping, manipulation and the rezipping, since many consumers can probably stream data from that table from the server without noticeable effect on server performance.

I doubt you'll be able to do this with set-based operations internal to the Oracle database, and I would also be thinking about the file system and how you are going to organize so many files (and whether you have space - remember the size taken up by a file on a the file system is always an even multiple of the file system block size).

Cade Roux
This is what I had in mind initially, I just wanted to check with some with more experience first. Thanks!
JimDaniel