views:

1514

answers:

5
+2  Q: 

Slow MySQL inserts

I am using and working on software which uses MySQL as a backend engine (it can use others such as PostgreSQL or Oracle or SQLite, but this is the main application we are using). The software was design in such way that the binary data we want to access is kept as BLOBs in individual columns (each table has one BLOB column, other columns have integers/floats to characterize the BLOB, and one string column with the BLOB's MD5 hash). The tables have typically 2, 3 or 4 indexes, one of which is always the MD5 column, which is made UNIQUE. Some tables already have millions of entries, and they have entered the multi-gigabyte in size. We keep separate per-year MySQL databases in the same server (so far). The hardware is quite reasonable (I think) for general applications (a Dell PowerEdge 2U-form server).

MySQL SELECT queries are relatively fast. There's little complaint there, since these are (most of the time) in batch mode. However, INSERT queries take a long time, which increases with table size (number of rows). Admittedly, this is because the MD5 column is of type UNIQUE and so each INSERT has to figure out whether each new row has a corresponding, already-inserted, MD5 string. And it's not too strange (I think) if the performance gets worse if there are other indexes (not unique). But I still can't put my mind to rest that this software architecture choice (I suspect keeping BLOBs in the table row instead of disk has a significant, negative impact) is not the best choice. Insertions are not critical, but it is an annoying feeling to have.

Does anyone have experience in similar situations? With MySQL, or even other (preferably Linux-based) RDBMes? Any insights you would care to provide, maybe some performance figures?

BTW, the working language is C++ (which wraps C calls to MySQL's API).

+1  A: 

Are you using MyISAM?
AFAIK MyISAM has a very good read-performance, but bad write performance.

InnoDB should be balanced in speed.

ppuschmann
Yes, I forgot that detail. The server we're using uses MyISAM.
jbatista
A: 

See Speed of INSERT Statements. Do you have frequent MD5 collisions? I believe these should not happen too many times, so maybe you can use something like INSERT ... ON DUPLICATE to handle the collisions. If you have specific insert periods, you can disable keys for the time of the insert and restore them later. Another option is to use replication, using a master machine for the inserts and a slave for the selects.

Yuval F
Thanks for the suggestions, I'll look into them.
jbatista
A: 

It could be a time for horizontal partitioning and moving blob field into a separate table. In this article in 'A Quick Side Note on Vertical Partitioning' author removes a larger varchar field from a table and it increases speed of a query about order of magnitude.

The reason is physical traversal of the data on a disk becomes significantly faster if there is less space to cover, so moving bigger fields elsewhere increases performance.

Also (and you probably do it already) it is beneficial to decrease the size of your index column to its absolute minumum (char(32) in ascii encoding for md5), because size of the key is directly proportional to the speed of its use.

If you do multiple inserts at a time with InnoDB tables you can significantly increase speed of inserts by wrapping them into transaction and doing mupliple inserts in one query:

START TRANSACTION
INSERT INTO x (id, md5, field1, field2) values (1, '123dab...', 'data1','data2'),(2,'ab2...','data3','data4'),.....;
COMMIT
dimus
The BEGIN TRANSACTION choice seems the most straightforward solution right now. However I have a few questions, since I'm not familiar with it (I'll keep RTFM in the meantime). 1) Since the MySQL's API is in C, it might happen we mess up and try to insert VALUES which point to null. This will almost certainly cause the client app to die with SEGFAULT. Will the transaction be "dirty" in this case? Any need to clean up? How?2) We're using MyISAM. Switching to InnoDB now can be painful and/or dangerous. Do you know of TRANSACTION performance on MyISAM?Thanks in advance for your feedback.
jbatista
MhISAM is not transactional, that means you can safely ignore everything I wrote about transactions, however the insert statement with several hundreds or thousands inserts in one query will speed things up in MyISAM as well.
dimus
OK. Do you know if the INSERT INTO syntax imposes a limit to the length of the query string? I.o.w., does MySQL accept an INSERT INTO query string with a length of (for example) hundreds of millions of characters? It smells like trouble to me.
jbatista
The limit is determined by max_allowed_packet in configuration file. I keep mine 16M. I usually add about 10000 records in one insert statement
dimus
A: 

Does your data fit in RAM? If not, get more RAM until that becomes uneconomic (16G is usually about the point for most people).

Then, do your indexes fit in the MyISAM key buffer?

If you're running a 32-bit OS, don't. Once you're on a 64-bit OS, set the key buffer to be approx 1/3 of the ram. RAM is used by the OS's cache to cache data files (which does little for inserts but is beneficial for selects).

Having multi-gigabyte tables in MyISAM can be a pain because in the event of an unclean shutdown, very lengthy repair operation(s) are required, but

Don't switch MySQL engines without significant validation of your application, it will change the behaviour in many ways (not just performance). It will affect disc space usage.

MarkR
A: 

I asked a somewhat-related question today as well.

One of the answers provided is to consider the INSERT DELAYED so that it goes into the insert queue, and is handled when the db is not as busy.

warren
Based on a reply in this thread, I've looked into MySQL's INSERT DELAYED documentation. They claim that care should be used when opting for INSERT DELAYED, because it can deteriorate overall performance in single-client insertions compared with "normal" INSERTs.
jbatista
ok - I did not see that myself, though I was also looking to be able to speed-up the script that's performing the inserts rather than to the performance of the insert itself
warren
it is worth noting that INSERT DELAYED works with MyASAM, but not InnoDB
dimus