Hi everyone,
have an issue here of how to configure mysql (myisam) properly for the bulk insert (load data infile) to be performed fast.
There is 6 Gb text file to be imported, 15 mln rows, 16 columns (some int, some varchar(255), one varchar(40), one char(1) some datetime, one mediumtext).
relative my.conf settings:
key_buffer = 800M
max_allowed_packet = 160M
thread_cache_size = 80
myisam_sort_buffer_size = 400M
bulk_insert_buffer_size = 400M
delay_key_write = ON
delayed_insert_limit = 10000
There are three indexes - one primary (autincrement int), one unique int and one unique varchar(40).
The problem is that after executing the load data infile command, the first 3 gigs of data are imported quickly (based on the increasing size of table.myd - 5-8 mb/s), but uppon crossing the 3020 Mb limit the import speed decreases greatly - the size of table.myd is growing 0,5mb/s. I've noticed, that the import process slows down upon the Key_blocks_unused
gets drained to zero. These are the output of mysql> show status like '%key%';
in the beginning of import:
mysql> show status like '%key%';
+------------------------+---------+
| Variable_name | Value |
+------------------------+---------+
| Com_preload_keys | 0 |
| Com_show_keys | 0 |
| Handler_read_key | 0 |
| Key_blocks_not_flushed | 57664 |
| Key_blocks_unused | 669364 |
| Key_blocks_used | 57672 |
| Key_read_requests | 7865321 |
| Key_reads | 57672 |
| Key_write_requests | 2170158 |
| Key_writes | 4 |
+------------------------+---------+
10 rows in set (0.00 sec)
and this is what how it looks after the 3020Mb limit, i.e. when key_blocks_unused
gets down to zero, and that's when the bulk insert process get really slow:
mysql> show status like '%key%';
+------------------------+-----------+
| Variable_name | Value |
+------------------------+-----------+
| Com_preload_keys | 0 |
| Com_show_keys | 0 |
| Handler_read_key | 0 |
| Key_blocks_not_flushed | 727031 |
| Key_blocks_unused | 0 |
| Key_blocks_used | 727036 |
| Key_read_requests | 171275179 |
| Key_reads | 1163091 |
| Key_write_requests | 41181024 |
| Key_writes | 436095 |
+------------------------+-----------+
10 rows in set (0.00 sec)
The problem is pretty clear, to my understanding - indexes are being stored in cache, but once the cache fills in, the indexes get written to disk one by one, which is slow, therefore all the process slows down. If i disable the unique index based on varchar(40) column and, therefore, all the indexes fit into Key_blocks_used
(i guess this is the variable directly dependant on key_buffer
, isn't it?), all the bulk import is successfull. So, i'm curious, how to make mysql put all the Key_blocks_used
data to disk at once, and free up the Key_blocks_used
?. I understand that it might be doing some sorting on-the-fly, but still, i guess it should be available to do some cached RAM-disk synchronization in order to successfully manage indexes even when they don't all fit into the memory cache. So my question is "how to configure mysql so that bulk inserting would avoid writing to disk on (almost)each index, even when all indexes don't fit into a cache?" last not least - delay_key_write is set to 1 for a given table, though it didn't add any speed-up, in comparison to when it was disabled.
Thanks for any thoughts, ideas, explanations and RTFMs in advance ! (:
One more little question - how would i calculate how many varchar(40) indexes would fit into cache before Key_blocks_unused
gets to 0?
P.S. disabling indexes with $myisamchk --keys-used=0 -rq /path/to/db/tbl_name
and then re-enabling them with $myisamchk -rq /path/to/db/tbl_name
, as described in Mysql docs is a known solution, which works, but only when bulk-inserting into an empty table. When there are some data in a table already, the index uniqueness checking is necessary, therefore disabling indexes is not a solution.