views:

52

answers:

3

I am trying to get a better understanding about insertion speed and performance patterns in mysql for a custom product. I have two tables to which I keep appending new rows. The two tables are defined as follows:

CREATE TABLE events (
 added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 id BINARY(16) NOT NULL,
 body MEDIUMBLOB,
 UNIQUE KEY (id)) ENGINE InnoDB;

CREATE TABLE index_fpid (
 fpid VARCHAR(255) NOT NULL,
 event_id BINARY(16) NOT NULL UNIQUE,
 PRIMARY KEY (fpid, event_id)) ENGINE InnoDB;

And I keep inserting new objects to both tables (for each new object, I insert the relevant information to both tables in one transaction). At first, I get around 600 insertions / sec, but after ~ 30000 rows, I get a significant slowdown (around 200 insertions/sec), and then a more slower, but still noticeable slowdown.

I can see that as the table grows, the IO wait numbers get higher and higher. My first thought was memory taken by the index, but those are done on a VM which has 768 Mb, and is dedicated to this task alone (2/3 of memory are unused). Also, I have a hard time seeing 30000 rows taking so much memory, even more so just the indexes (the whole mysql data dir < 100 Mb anyway). To confirm this, I allocated very little memory to the VM (64 Mb), and the slowdown pattern is almost identical (i.e. slowdown appears after the same numbers of insertions), so I suspect some configuration issues, especially since I am relatively new to databases.

The pattern looks as follows: alt text

I have a self-contained python script which reproduces the issue, that I can make available if that's helpful.

Configuration:

  • Ubuntu 10.04, 32 bits running on KVM, 760 Mb allocated to it.
  • Mysql 5.1, out of the box configuration with separate files for tables

[EDIT]

Thank you very much to Eric Holmberg, he nailed it. Here are the graphs after fixing the innodb_buffer_pool_size to a reasonable value: alt text

+1  A: 

Your indexes may just need to be analyzed and optimized during the insert, they gradually get out of shape as you go along. The other option of course is to disable indexes entirely when you're inserting and rebuild them later which should give more consistent performance.

Great link about insert speed.

ANALYZE. OPTIMIZE

Chuck Vose
There's only two indexes - the primary key, and a unique constraint. Both of which enforce unique values. Unless there's deletions, there isn't anything to maintain.
OMG Ponies
I don't think I can disable the index (in my application, I need to make the inserts available as soon as they are written, and the inserts come in "real-time", it is not batch processing).
David Cournapeau
I trust you OMG, if you say that's the case I eagerly wait your answer so that I can learn.
Chuck Vose
+1  A: 

Edit your /etc/mysql/my.cnf file and make sure you allocate enough memory to the InnoDB buffer pool. If this is a dedicated sever, you could probably use up to 80% of your system memory.

# Provide a buffer pool for InnoDB - up to 80% of memory for a dedicated database server
innodb_buffer_pool_size=614M

The primary keys are B Trees so inserts will always take O(logN) time and once you run out of cache, they will start swapping like mad. When this happens, you will probably want to partition the data to keep your insertion speed up. See http://dev.mysql.com/doc/refman/5.1/en/partitioning.html for more info on partitioning.

Good luck!

Eric Holmberg
That was it, thank you very much. I will update the graphs to show the difference, it is quite striking. Thanks for the partitioning tip: I was looking into partition already, but I did not want to go there without understanding this issue first.
David Cournapeau
Thanks for going the extra mile to update the graphs - the second set looks great!
Eric Holmberg
A: 

Verifying that the insert doesn't violate a key constraint takes some time, and that time grows as the table gets larger. If you're interested in flat out performance, using LOAD DATA INFILE will improve your insert speed considerably.

Joshua Martell
I expected insert slow down as the table grows (because of the log(N) cost of the index(es) updates). But the slowdown here is caused by increased IO, if I believe the iowait %, and that's unexpected (since I have a dataset which is small enough to fit entirely in memory). Also, my graphs do not exhibit a log(N) slow down.
David Cournapeau
It's somewhat rare that your entire dataset will fit into memory, but I'm glad your issue is solved. `LOAD DATA INFILE` will still show you further performance improvements if your requirements are such.
Joshua Martell