views:

581

answers:

5

Hi guys,

I'm trying to upload a 95Gb csv file into a MySQL database (MySQL 5.1.36) via the following command:

CREATE TABLE MOD13Q1 (
rid INT UNSIGNED NOT NULL AUTO_INCREMENT,
gid MEDIUMINT(6) UNSIGNED NOT NULL ,
yr SMALLINT(4) UNSIGNED NOT NULL ,
dyyr SMALLINT(4) UNSIGNED NOT NULL ,
ndvi DECIMAL(7,4) NOT NULL comment 'NA value is 9',
reliability TINYINT(4)  NOT NULL comment 'NA value is 9',
ndviquality1 TINYINT(1) NOT NULL ,
ndviquality2 TINYINT(1) NOT NULL ,
primary key (rid),
key(gid)
) ENGINE = MyISAM ;

LOAD DATA INFILE 'datafile.csv' INTO TABLE MOD13Q1 FIELDS TERMINATED by ',' LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(gid, yr, dyyr, ndvi, reliability,
ndviquality1, ndviquality2
) ;

I'm running this script via dos at the moment...but the database is not responding. It works for smaller .csv files (1.5Gb) fine. Would it work for this file size?

Do you have any recommendation on how to do this more efficiently/faster? Would engine = csv be an alternative (indexing not activated! -> so queries might run super slow?).

Thanks and cheers, Jan

+1  A: 

you should disable all the constrains when you are importing. Apart from that I think it should work properly and to be noted that it is going to take a while probably hours.

RageZ
+2  A: 

No easy way, you will have to split your data in chunks and then import those...

Sarfraz
+1  A: 

Thanks guys for the tips, It worked!

mysql> LOAD DATA INFILE 'E:\\AAJan\\data\\data.csv' INTO TABL
E MOD13Q1
    -> FIELDS TERMINATED by ','
    ->     LINES TERMINATED BY '\r\n'
    ->     IGNORE 1 LINES
    ->     (gid, yr, dyyr, ndvi, reliability,
    ->     ndviquality1, ndviquality2
    ->     ) ;
Query OK, -1923241485 rows affected (18 hours 28 min 51.26 sec)
Records: -1923241485  Deleted: 0  Skipped: 0  Warnings: 0

mysql>

Hope this is helpful for others avoiding splitting data up in chunks. Thanks for help, Cheers, Jan

Jan
A: 

Bcp ? .................................. Oh wait. It does not matter anyway it will be some bulk transaction. You need chunks. You need it to avoid overfilling yout log segment space. The lock count limits. Anything greater than 1 million of things at a time it too much. So the best known batch size for BCP is 10,000 records!

RocketSurgeon
A: 

Hi Jan, I have a requirement very much similar to yours. A 12gb file took more than 12 hours and also didn't load finally. I stopped it and then split it up into files containing 50million records each and it took 7 hours to load. How to make this faster? Did you set any configuration variables before successfully loading the 95GB data? And also did you take any other precautions? Please help me.

Thank you for your help, Sivaram

Sivaram