ansaurus

Question

Table with 80 million records and adding an index takes more than 18 hours (or forever)! Now what?

Answer 1

+1 A:

Ok turns out that this problem was more than just a simple create a table, index it and forget problem :) Here's what I did just in case someone else faces the same problem (I have used an example of IP Address but it works for other data types too):

Problem: Your table has millions of entries and you need to add an index really fast

Usecase: Consider storing millions of IP addresses in a lookup table. Adding the IP addresses should not be a big problem but creating an index on them takes more than 14 hours.

Solution: Partition your table using MySQL's Partitioning strategy

Case #1: When the table you want is not yet created

CREATE TABLE IPADDRESSES(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;

Case #2: When the table you want is already created. There seems to be a way to use ALTER TABLE to do this but I have not yet figured out a proper solution for this. Instead, there is a slightly inefficient solution:

CREATE TABLE IPADDRESSES_TEMP(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id)
) ENGINE=MYISAM;

Insert your IP addresses into this table. And then create the actual table with partitions:

CREATE TABLE IPADDRESSES(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;

And then finally

INSERT INTO IPADDRESSES(ipaddress) SELECT ipaddress FROM IPADDRESSES_TEMP;
DROP TABLE IPADDRESSES_TEMP;
ALTER TABLE IPADDRESSES ADD INDEX(ipaddress)

And there you go... indexing on the new table took me about 2 hours on a 3.2GHz machine with 1GB RAM :) Hope this helps.

Legend 2010-09-16 20:11:30

Answer 2

+1 A:

You are using MyISAM which is being depricated soon. An alternative would be InnoDB.

"InnoDB is a transaction-safe (ACID compliant) storage engine for MySQL that has commit, rollback, and crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation to coarser granularity locks) and Oracle-style consistent nonlocking reads increase multi-user concurrency and performance. InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary keys. To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrity constraints. You can freely mix InnoDB tables with tables from other MySQL storage engines, even within the same statement."\

http://dev.mysql.com/doc/refman/5.0/en/innodb.html

According to:

http://dev.mysql.com/tech-resources/articles/storage-engine/part_1.html

, you should be able to switch between different engine by utilizing a simple alter command which allows you some flexibility. It also states that each table in your DB can be configured independently.

Michael Eakins 2010-09-16 20:18:02

@Meakins: Thanks for the pointers. However, I am currently looking at a MyISAM specific solution though I will prepare a plan to move to InnoDB subsequently.

Legend 2010-09-16 20:42:00

No problem, just trying to help :)

Michael Eakins 2010-09-17 12:29:06

ansaurus

tags:

views:

answers:

Table with 80 million records and adding an index takes more than 18 hours (or forever)! Now what?

related questions