views:

159

answers:

3

Hey all,

I was wondering if anyone had any experience with what I am about to embark on. I have several csv files which are all around a GB or so in size and I need to load them into a an oracle database. While most of my work after loading will be read-only I will need to load updates from time to time. Basically I just need a good tool for loading several rows of data at a time up to my db.

Here is what I have found so far:

  1. I could use SQL Loader t do a lot of the work

  2. I could use Bulk-Insert commands

  3. Some sort of batch insert.

Using prepared statement somehow might be a good idea. I guess I was wondering what everyone thinks is the fastest way to get this insert done. Any tips?

A: 

With that amount of data, you'd better be sure of your backing store - the dbf disks' free space.

sqlldr is script drive, very efficient, generally more efficient than a sql script. The only thing I wonder about is the magnitude of the data. I personally would consider several to many sqlldr processes and assign each one a subset of data and let the processes run in parallel.

You said you wanted to load a few records at a time? That may take a lot longer than you think. Did you mean a few files at a time?

jim mcnamara
I have divided the data up into several files with the intention of letting things happen in parallel. I was thinking that piping several records in at a time would go quicker...since I would not have to have the overhead of interacting with the server for each record.
James
@James - Fixed width files can be loaded in parallel with SQL*Loader or external tables. So separate your files by contents - not to try and out think Oracle.
Paul James
+3  A: 

I would be very surprised if you could roll your own utility that will outperform SQL*Loader Direct Path Loads. Oracle built this utility for exactly this purpose - the likelihood of building something more efficient is practically nil. There is also the Parallel Direct Path Load, which allows you to have multiple direct path load processes running concurrently.

From the manual:

Instead of filling a bind array buffer and passing it to the Oracle database with a SQL INSERT statement, a direct path load uses the direct path API to pass the data to be loaded to the load engine in the server. The load engine builds a column array structure from the data passed to it.

The direct path load engine uses the column array structure to format Oracle data blocks and build index keys. The newly formatted database blocks are written directly to the database (multiple blocks per I/O request using asynchronous writes if the host platform supports asynchronous I/O).

Internally, multiple buffers are used for the formatted blocks. While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. Overlapping computation with I/O increases load performance.

There are cases where Direct Path Load cannot be used.

DCookie
+1 Excellent answer
Rob van Wijk
A: 

You may be able to create an external table on the CSV files and load them in by SELECTing from the external table into another table. Whether this method will be quicker not sure however might be quicker in terms of messing around getting sql*loader to work especially when you have a criteria for UPDATEs.

Stellios