views:

575

answers:

3

Hi everyone,

have an issue here of how to configure mysql (myisam) properly for the bulk insert (load data infile) to be performed fast.

There is 6 Gb text file to be imported, 15 mln rows, 16 columns (some int, some varchar(255), one varchar(40), one char(1) some datetime, one mediumtext).

relative my.conf settings:

key_buffer  = 800M
max_allowed_packet = 160M
thread_cache_size = 80
myisam_sort_buffer_size = 400M
bulk_insert_buffer_size = 400M
delay_key_write = ON
delayed_insert_limit = 10000

There are three indexes - one primary (autincrement int), one unique int and one unique varchar(40).

The problem is that after executing the load data infile command, the first 3 gigs of data are imported quickly (based on the increasing size of table.myd - 5-8 mb/s), but uppon crossing the 3020 Mb limit the import speed decreases greatly - the size of table.myd is growing 0,5mb/s. I've noticed, that the import process slows down upon the Key_blocks_unused gets drained to zero. These are the output of mysql> show status like '%key%'; in the beginning of import:

mysql> show status like '%key%';
+------------------------+---------+
| Variable_name          | Value   |
+------------------------+---------+
| Com_preload_keys       | 0       | 
| Com_show_keys          | 0       | 
| Handler_read_key       | 0       | 
| Key_blocks_not_flushed | 57664   | 
| Key_blocks_unused      | 669364  | 
| Key_blocks_used        | 57672   | 
| Key_read_requests      | 7865321 | 
| Key_reads              | 57672   | 
| Key_write_requests     | 2170158 | 
| Key_writes             | 4       | 
+------------------------+---------+
10 rows in set (0.00 sec)

and this is what how it looks after the 3020Mb limit, i.e. when key_blocks_unused gets down to zero, and that's when the bulk insert process get really slow:

mysql> show status like '%key%';
+------------------------+-----------+
| Variable_name          | Value     |
+------------------------+-----------+
| Com_preload_keys       | 0         | 
| Com_show_keys          | 0         | 
| Handler_read_key       | 0         | 
| Key_blocks_not_flushed | 727031    | 
| Key_blocks_unused      | 0         | 
| Key_blocks_used        | 727036    | 
| Key_read_requests      | 171275179 | 
| Key_reads              | 1163091   | 
| Key_write_requests     | 41181024  | 
| Key_writes             | 436095    | 
+------------------------+-----------+
10 rows in set (0.00 sec)

The problem is pretty clear, to my understanding - indexes are being stored in cache, but once the cache fills in, the indexes get written to disk one by one, which is slow, therefore all the process slows down. If i disable the unique index based on varchar(40) column and, therefore, all the indexes fit into Key_blocks_used (i guess this is the variable directly dependant on key_buffer, isn't it?), all the bulk import is successfull. So, i'm curious, how to make mysql put all the Key_blocks_used data to disk at once, and free up the Key_blocks_used?. I understand that it might be doing some sorting on-the-fly, but still, i guess it should be available to do some cached RAM-disk synchronization in order to successfully manage indexes even when they don't all fit into the memory cache. So my question is "how to configure mysql so that bulk inserting would avoid writing to disk on (almost)each index, even when all indexes don't fit into a cache?" last not least - delay_key_write is set to 1 for a given table, though it didn't add any speed-up, in comparison to when it was disabled.

Thanks for any thoughts, ideas, explanations and RTFMs in advance ! (:

One more little question - how would i calculate how many varchar(40) indexes would fit into cache before Key_blocks_unused gets to 0?

P.S. disabling indexes with $myisamchk --keys-used=0 -rq /path/to/db/tbl_name and then re-enabling them with $myisamchk -rq /path/to/db/tbl_name, as described in Mysql docs is a known solution, which works, but only when bulk-inserting into an empty table. When there are some data in a table already, the index uniqueness checking is necessary, therefore disabling indexes is not a solution.

A: 

I have the same problem. After I configured the MyIsam in MySQL the insert is very slow.

Any support is appreciated...

GACP
+1  A: 

When you import data with "load data infile", I think mysql perform the insert one by one and with each insert, it tries to update the index file .MYI as well and this could slow down your import as it consume bot I/O and CPU resources for each individual insert.

What you could do is add 4 files to your import file to disable the keys of your table and enable it at the end of the insert statement and you should see the difference.

LOCK TABLES tableName WRITE;
ALTER TABLE tableName DISABLE KEYS;
----
your insert statement from  go here..
----
ALTER TABLE tableName ENABLE KEYS
UNLOCK TABLES;

If you don't want to edit your data file, try to use mysqldump to get a proper dump file and you shouldn't run into this slowness with import data.

##Dump the database
mysqldump databaseName > database.sql

##Import the database
mysql databaseName < database.sql

Hope this helps!

sai
A: 

I am not sure the key_buffer you mention is same as key_buffer_size.

I had faced similar problem. My problem was resolved by bumping up the key_buffer_size value to something like 1GB. Check my question here.

Shashikant Kore