tags:

views:

34

answers:

1

Hi, I'm in the process of optimizing an import of ~10TB of Data into a MySQL database. Currently, I can import 2.9GB (+0.8GB index) in about 14 minutes on a current laptop. The process includes reading a data file (Oracle ".dat" export), parsing the data, writing the data into a CSV file and executing the "LOAD DATA LOCAL" sql command on it.

Is it possible to increase the import speed (without hardware changes)? Is there a way to remove the step of writing a file to the file system and letting MySQL read it again. Is it possible to stream the data in memory directly to MySQL (e.g., via the JDBC driver)?

Many thanks in advance, Joerg.

A: 

This may be what you are looking for: Source Test 4 urself.

On Unix, if you need LOAD DATA to read from a pipe, you can use the following technique (the example loads a listing of the / directory into the table db1.t1):

mkfifo /mysql/data/db1/ls.dat
chmod 666 /mysql/data/db1/ls.dat
find / -ls &
/mysql/data/db1/ls.dat & mysql -e "LOAD DATA INFILE 'ls.dat' INTO TABLE t1" db1

Note that you must run the command that generates the data to be loaded and the mysql commands either on separate terminals, or run the data generation process in the background (as shown in the preceding example). If you do not do this, the pipe will block until data is read by the mysql process.

According to 7.2.2.1 Speed of Insert Statements, LOAD DATA INFILE is usually 20 times faster than INSERT. That is probably not what you are looking for, but if it is important to you, you should test it urself.

According to 1.2.2.6 Load Data Infile Syntax

Using LOCAL is a bit slower than letting the server access the files directly, because the contents of the file must be sent over the connection by the client to the server. On the other hand, you do not need the FILE privilege to load local files.

So if u put the files on the server, loading might be faster. Test 4 urself.

This is my speculation: keys, indices, and/or constraints may be slowing down ur bulk data load. So the below may speed things up. OTH you will eventually have to deal with them, so it may slow things down. Test 4 urself.

CREATE TEMPORARY TABLE dataHold {suitable schema, no keys, indices, or constraints} ;
LOAD DATA INFILE into dataHold ;
INSERT INTO realTable SELECT * FROM dataHold ;
DROP TEMPORARY TABLE dataHold
emory