Scenario:
- I load some data into a local MySQL database each day, about 2 million rows;
- I have to (have to - it's an audit/regulatory thing) move to a "properly" administered server, which currently looks to be Oracle 10g;
- The server is in a different country: a network round-trip current takes 60-70 ms;
- Input is a CSV file in a denormalised form: I normalise the data before loading, each line typically results in 3-8 INSERTs across up to 4 tables;
- The load script is currently implemented in Ruby, using ActiveRecord and fastercsv. I've tried the ar-extensions gem, but it assumes that the MySQL style multiple values clause idea will work. It doesn't.
EDIT: Extremely useful answers already - thank-you! More about that pesky input file. The number of fields is variable and positions have changed a few times - my current script determines content by analysing the header row (well, fastercsv and a cunning converter do it). So a straight upload and post-process SQL wouldn't work without several versions of the load file, which is horrible. Also it's a German CSV file: semi-colon delimited (no big deal) and decimals indicated by commas (rather bigger deal unless we load as VARCHAR and text-process afterwards - ugh).
The problem:
Loading 2 million rows at about 7/sec is going to take rather more than 24 hours! That's likely to be a drawback with a daily process, not to mention that the users would rather like to be able to access the data about 5 hours after it becomes available in CSV form!
I looked at applying multiple inserts per network trip: the rather ungainly INSERT ALL...
syntax would be fine, except that at present I'm applying a unique id to each row using a sequence. It transpires that
INSERT ALL
INTO tablea (id,b,c) VALUES (tablea_seq.nextval,1,2)
INTO tablea (id,b,c) VALUES (tablea_seq.nextval,3,4)
INTO tablea (id,b,c) VALUES (tablea_seq.nextval,5,6)
SELECT 1 FROM dual;
(did I say it was ungainly?) tries to use the same id for all three rows. Oracle docus appear to confirm this.
Latest attempt is to send multiple INSERTs in one execution, e.g.:
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,1,2);
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,3,4);
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,5,6);
I haven't found a way to persuade Oracle to accept that.
The Question(s)
- Have I missed something obvious? (I'd be so pleased if that turned out to be the case!)
- If I can't send multiple inserts, what else could I try?
Why Accept That One?
For whatever reason, I prefer to keep my code as free from platform-specific constructs as possible: one reason this problem arose is that I'm migrating from MySQL to Oracle; it's possible another move could occur one day for geographical reasons, and I can't be certain about the platform. So getting my database library to the point where it can use a text SQL command to achieve reasonable scaling was attractive, and the PL/SQL block accomplishes that. Now if another platform does appear, the change will be limited to changing the adapter in code: a one-liner, in all probability.