views:

129

answers:

2

I have a database in which I need to insert batches of data (around 500k records at a time). I was testing with derby and was seeing insert times of about 10-15minutes for this many records (I was doing a batch insert in Java).

Does this time seem slow (working on your average laptop)? And are there approaches to speeding it up?

thanks,

Jeff

A: 

Do the tables have a lot of indexes? A lot of time could be spent updating these indexes.

GreenieMeanie
this is more appropriate if placed as a "comment"
Joset
there are only 2 indexes in it.
Jeff Storey
+3  A: 

This time seems perfectly reasonable, and is in agreement with times I have observed. If you want it to go faster, you need use bulk insert options and disable safety features:

  • Use PreparedStatements and batches of 5,000 to 10,000 records unless it MUST be one transaction
  • Use bulk loading options in the DBMS
  • Disable integrity checks temporarily for insert
  • Disable indexes temporarily or delete indexes and re-create them post-insert
  • Disable transaction logging and re-enable afterward.

EDIT: Database transactions are limited by disk I/O, and on laptops and most hard drives, the important number is seek time for the disk.

Laptops tend to have rather slow disks, at 5400 rpm. At this speed, seek time is about 5 ms. If we assume one seek per record (an over-estimate in most cases), it would take 40 minutes (500000 * 5 ms) to insert all rows. Now, the use of caching mechanisms and sequencing mechanisms reduces this somewhat, but you can see where the problem comes from.

I am (of course) vastly oversimplifying the problem, but you can see where I'm going with this; it's unreasonable to expect databases to perform at the same speed as sequential bulk I/O. You've got to apply some sort of indexing to your record, and that takes time.

BobMcGee
That makes sense. Thanks.
Jeff Storey