views:

452

answers:

7

The problem is, we have a huge number of records (more than a million) to be inserted into a single table from a Java application. The records are created by the Java code, it's not a move from another table, so INSERT/SELECT won't help.

Currently, my bottleneck is the INSERT statements. I'm using PreparedStatement to speed-up the process, but I can't get more than 50 recods per second on a normal server. The table is not complicated at all, and there are no indexes defined on it.

The process takes too long, and the time it takes will make problems.

What can I do to get the maximum speed (INSERT per second) possible?

Database: MS SQL 2008. Application: Java-based, using Microsoft JDBC driver.

+2  A: 

Have you looked into bulk operations bulk operations?

FrustratedWithFormsDesigner
I will try it, I guess it will be my best solution. The only problem is I have to create files and then run the operation, and I will have to code for the different scenarios that can happen for file storage and network conditions.
Iravanchi
+7  A: 

Batch the inserts. That is, only send 1000 rows at a time, rather then one row at a time, so you hugely reduce round trips/server calls

Performing Batch Operations on MSDN for the JDBC driver. This is the easiest method without reengineering to use genuine bulk methods

There are better ways, but this works if you are limited to generated INSERTs

gbn
I think the round-trip is very small part of the delay, because with 50 transactions per second, it takes 20ms for each query to run. The round-trip is smaller than 1ms. I have done other optimizations to remove round-trips but they didn't help much. Unless batching the INSERTs will cause a big part of SQL internal processings more efficient. Does it?
Iravanchi
@Irchi: Each insert must be parsed and compiled and executed. A batch will mean a lot less parsing/compiling because a 1000 (for example) inserts will be compiled in one go
gbn
@Irchi: I'd try this before to re-engineer the code to us a BCP approach
gbn
A: 

Look into Sql Server's bcp utility.

This would mean a big change in your approach in that you'd be generating a delimited file and using an external utility to import the data. But this is the fastest method for inserting a large number of records into a Sql Server db and will speed up your load time by many orders of magnitude.

Also, is this a one-time operation you have to perform or something that will occur on a regular basis? If it's one time I would suggest not even coding this process but performing an export/import with a combination of db utilities.

Paul Sasik
I guess BULK INSERT uses BCP internally. Am I right?
Iravanchi
+1  A: 

Have you considered to use batch updates?

Manolo Santos
Thanks, I guess this can be helpful too. But I will try BULK INSERT first, it seems more promising!
Iravanchi
+2  A: 

Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.

Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.

The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.

Kragen
I actually have one index on the PK which is clustered, and the data are inserted in PK order, so I don't think it will have any effect.I will be trying BULK INSERT, I guess it's my solution.
Iravanchi
A: 

I would recommend using an ETL engine for it. You can use Pentaho. It's free. The ETL engines are optimized for doing bulk loading on data and also any forms of transformation/validation that are required.

CoolBeans
+1  A: 

Is there any integrity constraint or trigger on the table ? If so, droping it before inserts will help, but you have to be sure that you can afford the consequences.

binary_runner
good point, triggers don't help
gbn
There are two FK constraints, I was planning to remove them and give it a try. But BULK INSERT have the option of ignoring the constraints, so I guess using BULK INSERT I will have all the advantages I need.
Iravanchi