views:

288

answers:

6

I have written a program in C to parse large XML files and then create files with insert statements. Some other process would ingest the files into a MySQL database. This data will serve as a indexing service so that users can find documents easily.

I have chosen InnoDB for the ability of row-level locking. The C program will be generating any where from 500 to 5 million insert statements on a given invocation.

What is the best way to get all this data into the database as quickly as possible? The other thing to note is that the DB is on a separate server. Is it worth moving the files over to that server to speed up inserts?

EDIT: This table won't really be updated, but rows will be deleted.

+1  A: 

I'd do at least these things according to this link:

  1. Move the files there and connect over the unix socket
  2. Generate, instead of the INSERTS, a LOAD DATA INFILE file
  3. Disabling indexes during the loading
Vinko Vrsalovic
+11  A: 
  • Use the mysqlimport tool or the LOAD DATA INFILE command.
  • Temporarily disable indices that you don't need for data integrity
divideandconquer.se
+1  A: 

MySQL with the standard table formats is wonderfully fast as long as it's a write-only table; so the first question is whether you are going to be updating or deleting. If not, don't go with innosys - there's no need for locking if you are just appending. You can truncate or rename the output file periodically to deal with table size.

le dorfier
A: 

1. Make sure you use a transaction.

Transactions eliminate the

INSERT, SYNC-TO-DISK

repetition phase and instead all the disk IO is performed when you COMMIT the transaction.

2. Make sure to utilize connection compression

Raw text + GZip compressed stream ~= as much as 90% bandwidth saving in some cases.

3. Utilise the parallel insert notation where possible

INSERT INTO TableName(Col1,Col2) VALUES (1,1),(1,2),(1,3)

( Less text to send, shorter action )

Kent Fredric
If it's a write-only table, and he can be pursuaded to use std MySQL format, that's all overhead. Especially the overhead of transactions you don't need.
le dorfier
Do you seriously think this is faster than LOAD DATA?
le dorfier
heh, probably not, but if LOAD DATA was not an option, the rest makes sense.
Kent Fredric
A: 

If you can't use LOAD DATA INFILE like others have suggested, use prepared queries for inserts.

R. Bemrose
A: 

Really depends on the engine. If you're using InnoDB, do use transactions (you can't avoid them - but if you use autocommit, each batch is implicitly in its own txn), but make sure they're neither too big or too small.

If you're using MyISAM, transactions are meaningless. You may achieve better insert speed by disabling and enabling indexes, but that is only good on an empty table.

If you start with an empty table, that's generally best.

LOAD DATA is a winner either way.

MarkR