ansaurus

Question

Answer 1

+10 A:

How about shipping the csv file to the oracle db server, use SQLLoader to load the csv file into a staging table and then running a stored procedure to transform and INSERT it in the final tables?

Nuno G 2009-02-11 13:24:39

This is probably the fastest way to do it. You might not need to copy the file to the oracle server either; I think you can run SQLLoader locally.

Mr. Shiny and New 2009-02-11 14:02:21

Yes, you would probably choose to run it locally on the client. Very fast indeed.

David Aldridge 2009-02-11 14:06:23

It's something I'm considering: do the extract and transform stages to make something that sqlldr can digest (because the input file format is, well, tricky)

Mike Woodhouse 2009-02-11 14:46:45

SQL*Loader has some pretty good data manipulation/transformation tools built-in, as well as the ability to handle variable record lengths and number of fields.

DCookie 2009-02-11 17:41:58

How about handling 123,456 (German for 123.456) ?

Mike Woodhouse 2009-02-13 13:55:26

You can specify transformation functions in the SQLLoader control file. Like "REPLACE(:field1,',','.')".

Nuno G 2009-02-13 14:15:25

Answer 2

+4 A:

You could use:

insert into tablea (id,b,c)
 ( select tablea_seq.nextval,1,2 from dual union all
   select tablea_seq.nextval,3,4 from dual union all
   select tablea_seq.nextval,3,4 from dual union all
   select tablea_seq.nextval,3,4 from dual union all
   ...
 )

This works until about up to 1024 lines when I remember correctly.

You could also send it as a PL/SQL batch instruction:

BEGIN
 INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,1,2);
 INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,3,4);
 INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,5,6);
 ...
 COMMIT;
END

Kosi2801 2009-02-11 13:30:46

I usually use the SQL Batch insert mentioned here, adding a call to 'ALTER SESSION ENABLE PARALLEL DML' first.

Andrew Barnett 2009-02-11 14:16:46

Hae the first one, but the second... Oh-ho. Please hold, while I go hack my code... That works! I wonder if it scales. Hold again please... Hmm. 100 works, 200 doesn't. Limit on string size?

Mike Woodhouse 2009-02-11 15:29:41

I haven't hit a size-limit on PL/SQL size so far but to be honest I think my scripts weren't anywhere near as big as yours would grow.Nevertheless, if you cut down round-trips by roughly a factor of 100 (100 inserts at once instead of just 1) you should save lots of time.

Kosi2801 2009-02-11 16:14:13

There's a limit of 64000 "Diana nodes", which relates to the parsed Ada code. A bit like an AST, I think. For a small (5-column) table I can go to about 7500 inserts, or about 850KB of SQL. (the 200 row thing was my bug!)

Mike Woodhouse 2009-02-15 10:56:34

Answer 3

+2 A:

I would be loading the raw CSV file to a dedicated table in the database using SQL*Loader without normalising it, then running code against the data in the table to normalise it out to the various required tables. This would minimise the round trips, and is a pretty conventional approach in the Oracle community.

SQL*Loader can be a little challenging in terms of initial learning curve, but you can soon post again if you get stuck.

http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/part_ldr.htm#i436326

David Aldridge 2009-02-11 14:13:17

Working without normalisation is going to be hard: the file has variable record size, variable number of fields, position determined by column header. And it's German in origin, so decimal points are commas. But a pre-load transform is definitely an option, although it's not without pain.

Mike Woodhouse 2009-02-11 14:49:10

SQL*Loader has some powerful built-in data manipulation and transformation tools. It can handle variable records sizes and number of columns as long as you can key off a value in the record.

DCookie 2009-02-11 17:39:53

If you have already written the code to parse the file then spitting it out to 4 csv files then invoking sql*loader would probably be the most efficient thing then.

David Aldridge 2009-02-11 21:13:38

Answer 4

A:

I have a quick suggestion. I'm from the MySQL world but was trained on Oracle, I think this will work.

In MySQL you can insert multiple records with a single insert statement. It looks like this:

INSERT INTO table_name (column_one, column_two, column_three, column_four)
VALUES
    ('a', 'one', 'alpha', 'uno'),        // Row 1
    ('b', 'two', 'beta', 'dos'),         // Row 2
    ('c', 'three', 'gamma', 'tres'),     // etc.
....
    ('z', 'twenty-six', 'omega', 'veintiséis');

Now obviously you can only insert into one table at once, and you wouldn't want to do 2 million records, but you could easily do 10 or 20 or 100 at a time (if you are allowed packets that big). You may have to generate this by hand, I don't know if they frameworks you are using (or any, for that matter) support making this kind of code for you.

In the MySQL world this DRAMATICALLY speeds up inserts. I assume it does all the index updates and such at the same time, but it also prevents it from having to re-parse the SQL on each insert.

If you combine this with prepared statements (so the SQL is cached and it doesn't have to be parsed out each time) and transactions (to make sure things are always in a sane state when you have to do inserts across multiple tables... I think you will be doing pretty good.

NunoG is right that you can load the CSVs directly into Oracle. You may be best off reading in the input file, generating a normalized set of CSV files (one for each table), and then loading in each of those one at a time.

MBCook 2009-02-12 16:50:30

Ah, if only! I'm actually migrating from MySQL (long story) where this works fine - pity Oracle didn't think of it. The PL/SQL block thing kinda works, though. Thanks for the input.

Mike Woodhouse 2009-02-13 13:53:37

Answer 5

A:

Instead of executing the SQL over the network you could write the Inserts to a text file, move it over the network and run it locally there.

Equistatic 2009-02-12 23:24:17

Nice lateral thinking! I'm also looking into getting access to a workstation on site and doing the whole thing over there.

Mike Woodhouse 2009-02-13 13:52:51

Answer 6

A:

SQL*Loader is an Oracle-supplied utility that allows you to load data from a flat file into one or more database tables. This will be 10-100 times faster than inserts using queries.

http://www.orafaq.com/wiki/SQL*Loader_FAQ

If SQL*Loader doesn't cut it, try a small preprocessor program that formats the file in SQL*Loader readable format.

Andomar 2009-02-15 11:16:56

ansaurus

tags:

views:

answers:

Can I send "batched" INSERTs to Oracle?

Scenario:

The problem:

The Question(s)

Why Accept That One?

related questions