views:

754

answers:

2

Let me set up the situation. We are trying to insert a modestly high number of rows (roughly 10-20M a day) into a MyISAM table that is modestly wide:

+--------------+--------------+------+-----+---------+-------+
| Field        | Type         | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| blah1        | varchar(255) | NO   | PRI |         |       | 
| blah2        | varchar(255) | NO   | PRI |         |       | 
| blah3        | varchar(5)   | NO   | PRI |         |       | 
| blah4        | varchar(5)   | NO   | PRI |         |       | 
| blah5        | varchar(2)   | NO   | PRI |         |       | 
| blah6        | varchar(2)   | NO   | PRI |         |       | 
| blah7        | date         | NO   | PRI |         |       | 
| blah8        | smallint(6)  | NO   | PRI |         |       | 
| blah9        | varchar(255) | NO   | PRI |         |       | 
| blah10       | bigint(20)   | YES  |     | NULL    |       | 
+--------------+--------------+------+-----+---------+-------+

The only index besides that whopping primary key is on the blah7, the date field. We are using LOAD DATA INFILE and seeing what strikes me as pretty awful performance, around 2 hours to load the data. I was led to believe that LOAD DATA INFILE was orders of magnitude faster than that.

Interestingly, we have some less fat tables (5-6 fields) that we also use LOAD DATA INFILE to batch data into and we see much better performance on those. The number of records is quite a bit smaller, which leads me to think that we are running up against a buffer size limit when we load the large table, and are having to go to disk (and really, what else but going to disk would explain such slow load times?).

...which brings me to my question. What my.cnf settings are most important when dealing with LOAD DATA INFILE commands?

+1  A: 

I don't know about settings, but my money is on that composite primary key as to why you are having such poor performance.

Jordan S. Jones
A: 

Inserting into indexes in general kills performance. You may be better off removing the index before inserting data and re-indexing after insertion.

From http://forum.percona.com/s/m/983/:

Normally MySQL is rather fast loading data in MyISAM table, but there is exception, which is when it can't rebuild indexes by sort but builds them row by row instead. It can be happening due to wrong configuration (ie too small myisam_max_sort_file_size or myisam_max_extra_sort_file_size) or it could be just lack of optimization, if you're having large (does not fit in memory) PRIMARY or UNIQUE indexes.

Also check out http://www.mysqlperformanceblog.com/2007/05/24/predicting-how-long-data-load-would-take/ and http://www.linuxtopia.org/online_books/database_guides/mysql_5.1_database_reference_guide/insert-speed.html.

mattkemp