views:

269

answers:

3

I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?

try {
        // Disable auto-commit
        connection.setAutoCommit(false);

        // Create a prepared statement
        String sql = "INSERT INTO mytable (xxx), VALUES(?)";
        PreparedStatement pstmt = connection.prepareStatement(sql);

        Object[] vals=set.toArray();
        for (int i=0; i<vals.length; i++) {
            pstmt.setString(1, vals[i].toString());
            pstmt.addBatch();
        }

        // Execute the batch
        int [] updateCounts = pstmt.executeBatch();
        System.out.append("inserted "+updateCounts.length);
A: 

You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)

If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.

nos
load data infile might be a good alternative, but my input file needs a cleanup, I'm only interested in inserting certain rows where the second token matches a string(space delimited tokens), is load data infile flexible enough to filter rows?
I don't think it can filter, but you can clean up the data yourself , write a new file with the cleaned data and load that file.
nos
A: 

If:

  1. It's a new table, or the amount to be inserted is greater then the already inserted data
  2. There are indexes on the table
  3. You do not need other access to the table during the insert

Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.

Wrikken
A: 

You may try using DDBulkLoad object.

// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
Lalith