views:

59

answers:

3

Hey, im trying to wirte about 600000 Tokens into my MySQL Database Table. The Engine I'm using is InnoDB. The update process is taking forever :(. So my best guess is that I'm totally missing something in my code and that what I'm doing is just plain stupid.

Perhaps someone has a spontaneous idea about what seems to eat my performance:

Here is my code:

public void writeTokens(Collection<Token> tokens){

    try{
        PreparedStatement updateToken = dbConnection.prepareStatement("UPDATE tokens SET `idTag`=?, `Value`=?, `Count`=?, `Frequency`=? WHERE `idToken`=?;");

        for (Token token : tokens) {

            updateToken.setInt(1, 0);
            updateToken.setString(2, token.getWord());
            updateToken.setInt(3, token.getCount());
            updateToken.setInt(4, token.getFrequency());
            updateToken.setInt(5, token.getNounID());

            updateToken.executeUpdate();
        }
    }catch (SQLException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

Thanks a lot!

+5  A: 

I don't have a Java-specific answer for you, but wrap the whole shebang in a transaction. If you don't, then MySQL (when writing against InnoDB) starts and commits a new transaction per update statement.

Just execute START TRANSACTION before you start your calls, and execute COMMIT after all your updates/inserts are done. I also think that MySQL defers index updates until the end of the transaction, as well, which should help improve performance considerably if you're updating indexed fields.

Chris Heald
or this, should have the same effect, database-wise
Nicolas78
These are all great answers ... it's sad that i can only accept one as my accepted answer because what really did the trick was a combination off all. Tanks again @All
evermean
+4  A: 

If you have an index on one or more of the fields in your table, each update enforces a rebuild of those indices, which may in fact take a while as you approach several hundreds of thousands of entries.

PreparedStatement comes with an addBatch() method - I haven't used it but if I get it correctly, you can transmit several batches of records to your prepared statement and then update in one go. This reduces the number of index rebuilds from 600.000 to 1 - you should feel the difference :)

Nicolas78
+3  A: 

Each update statement requires a roundtrip to the database. This will give you a huge performance hit.

There are a couple of ways you insert this data into the database without performing hundreds of thousands of queries:

  • Use a bulk insert (LOAD DATA INFILE).
  • Use a single insert statement that inserts multiple rows at once. You could for example insert 100 rows per insert statement.

Then you can use a single update statement to copy the data into the target table. This will reduce the number of server roundtrips, improving the performance.

Mark Byers