views:

35

answers:

2

I have a MyISAM table with ~50'000'000 records (tasks for web crawler):

CREATE TABLE `tasks2` (
    `id` int(11) NOT NULL auto_increment,
    `url` varchar(760) character set latin1 NOT NULL,
    `state` varchar(10) collate utf8_bin default NULL,
    `links_depth` int(11) NOT NULL,
    `sites_depth` int(11) NOT NULL,
    `error_text` text character set latin1,
    `parent` int(11) default NULL,
    `seed` int(11) NOT NULL,
    `random` int(11) NOT NULL default '0',
    PRIMARY KEY  (`id`),
    UNIQUE KEY `URL_UNIQUE` (`url`),
        KEY `next_random_task` (`state`,`random`)
) ENGINE=MyISAM AUTO_INCREMENT=61211954 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

Once every few seconds one of the following operations occur (but never simultaneously):

  1. INSERT ... VALUES (500 rows) - inserts new tasks
  2. UPDATE ... WHERE id IN (up to 10 ids) - updates state for batch of tasks
  3. SELECT ... WHERE (by next_random_task index) - loads batch of tasks for processing

My problem is that inserts and updates are very slow - running on the order of tens of seconds, sometimes over a minute. Selects are fast, though. Why could this happen and how to improve performance?

+1  A: 
  • EXPLAIN a sample UPDATE against the full table to ensure the primary key index is being used.

  • Consider changing state to a TINYINT or ENUM to make its index smaller. (ENUM might not actually do this).

  • Do you need the unique key on url? This will slow down inserts.

Andrew Duffy
+1  A: 

~50M on a regular hardware is a decent number.

Please go through this question on sf (even though it is written for InoDB, there are similar parameters for MyISAM)

After that you should start the cycle of

  • identifying (logging) slow queries to understand you patterns (or confirm your assumptions)
  • tweaking my.cnf or adding/removing indexes (depending on the patterns)
  • measuring improvements
Unreason