views:

543

answers:

7

I am trying to optimize one part of my code that inserts data into MySQL. Should I chain INSERTs to make one huge multiple-row INSERT or are multiple separate INSERTs faster?

A: 

Try it and see?

gms8994
I could do that via BENCHMARK() function, but I am sure this issue has been resolved and I don't need to duplicate benchmarking research.
dusoft
John MacIntyre
correct, although it's easier to read wisdom of others that could put some evidence such as quote from the manual...
dusoft
An ounce of test it yourself on your system is worth a pound of someone else's tests on some other system. On the other hand, "A coupla months in the laboratory can save a coupla hours in the library." -- Westheimer's Discovery
Craig Trader
When it is simple to set up profiling (such as in this situation) relying on others opinions and theory from books only, is asking for trouble. There are many factors such as hardware, network and design approach that can impact speed. Checking it on your platform is the only way to be sure.
Ash
+3  A: 

Send as many inserts across the wire at one time as possible. The actual insert speed should be the same, but you will see performance gains from the reduction of network overhead.

RC
+3  A: 

In general the less number of calls to the database the better (meaning faster, more efficient), so try to code the inserts in such a way that it minimizes database accesses. Remember, unless your using a connection pool, each databse access has to create a connection, execute the sql, and then tear down the connection. Quite a bit of overhead!

ennuikiller
what if the persistent connection is used?
dusoft
There is still overhead. Transit time alone (to and from for each separate insert) will quickly be perceptible if you're doing thousands of inserts.
RC
+2  A: 

In general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.

Depending on which language you are using, you can possibly create a batch in your programming/scripting language before going to the db and add each insert to the batch. Then you would be able to execute a large batch using one connect operation. Here's an example in Java.

Chris Williams
+18  A: 

http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:

  • Connecting: (3)
  • Sending query to server: (2)
  • Parsing query: (2)
  • Inserting row: (1 × size of row)
  • Inserting indexes: (1 × number of indexes)
  • Closing: (1)

From this it should be obvious, that sending one large statement will save you an overhead of 7 per insert statement, which in further reading the text also says:

If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

mbarkhau
thank you very much for finding the part in the manual I couldn't...
dusoft
A: 

A major factor will be whether you're using a transactional engine and whether you have autocommit on.

Autocommit is on by default and you probably want to leave it on; therefore, each insert that you do does its own transaction. This means that if you do one insert per row, you're going to be committing a transaction for each row.

Assuming a single thread, that means that the server needs to sync some data to disc for EVERY ROW. It needs to wait for the data to reach a persistent storage location (hopefully the battery-backed ram in your RAID controller). This is inherently rather slow and will probably become the limiting factor in these cases.

I'm of course assuming that you're using a transactional engine (usually innodb) AND that you haven't tweaked the settings to reduce durability.

I'm also assuming that you're using a single thread to do these inserts. Using multiple threads muddies things a bit because some versions of MySQL have working group-commit in innodb - this means that multiple threads doing their own commits can share a single write to the transaction log, which is good because it means fewer syncs to persistent storage.

On the other hand, the upshot is, that you REALLY WANT TO USE multi-row inserts.

There is a limit over which it gets counter-productive, but in most cases it's at least 10,000 rows. So if you batch them up to 1,000 rows, you're probably safe.

If you're using MyISAM, there's a whole other load of things, but I'll not bore you with those. Peace.

MarkR
A: 

As answered before, you might want to :

  • check that auto-commit is off
  • open connexion
  • send a multiple batchs of inserts in a single transaction (size of about 4000-10000 rows ? you see)
  • close connexion

An extra, depending on how well your server scales (its definitively ok with PostgreSQl, Oracle and MSSQL), do the thing above with multiple threads and multiple connections.

Antibarbie