views:

125

answers:

8

Hi! I've got an application which needs to run a daily script; the daily script consists in downloading a CSV file with 1,000,000 rows, and inserting those rows into a table.

I host my application in Dreamhost. I created a while loop that goes through all the CSV's rows and performs an INSERT query for each one. The thing is that I get a "500 Internal Server Error". Even if I chop it out in 1000 files with 1000 rows each, I can't insert more than 40 or 50 thousand rows in the same loop.

Is there any way that I could optimize the input? I'm also considering going with a dedicated server; what do you think?

Thanks!

Pedro

+9  A: 

Most databases have an optimized bulk insertion process - MySQL's is the LOAD DATA FILE syntax.

To load a CSV file, use:

LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\r\n'
  IGNORE 1 LINES;
OMG Ponies
+1 but I doubt this will work in a shared hosting scenario, since the database server is usually separated from the web server and hence won't have access to user files.
casablanca
OMG Ponies
If you can't upload the data file to the db server host, then use `LOAD DATA LOCAL INFILE ...` then you can upload the data file to the client end of the MySQL connection, which in this case is the PHP app host.
Bill Karwin
+4  A: 

Insert multiple values, instead of doing

insert into table values(1,2);

do

insert into table values (1,2),(2,3),(4,5);

Up to an appropriate number of rows at a time.

Or do bulk import, which is the most efficient way of loading data, see

http://dev.mysql.com/doc/refman/5.0/en/load-data.html

nos
A: 

You can create cronjob script which adds x records to the database at one request. Cronjob script will check if last import have not addded all needed rows he takes another x rows.

So you can add as many you need rows.

If you have your dedicated server it's more easier. You just run loop with all insert queries.

Of course you can try to set time_limit to 0 (if it's working on dreamhost) or make it bigger.

Vaidas Zilionis
A: 

Your PHP script is most likely being terminated because it exceeded the script time limit. Since you're on a shared host, you're pretty much out of luck.

If you do switch to a dedicated server and if you get shell access, the best way would be to use the mysql command-line tool to insert the data.

casablanca
A: 

OMG Ponies suggestion is great, but I've also 'manually' formatted data into the same format that mysqldump uses, then loaded it that way. Very fast.

mkoistinen
A: 

Have you tried doing transactions? Just send the command BEGIN to MySQL, do all your inserts then do COMMIT. This would speed it up significantly,but like casablanca said, your script is probably timing out as well.

jonescb
+1  A: 

Normally I would say just use LOAD DATA INFILE, but it seems you can't with your shared hosting environment.

I haven't used MySQL in a few years, but they have a very good document which describes how to speed up insertions for bulk insertions: http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

A few ideas that can be gleaned from this:

  • Disable/enable keys around the insertions:

    ALTER TABLE tbl_name DISABLE KEYS; ALTER TABLE tbl_name ENABLE KEYS;

  • Use many values in your insert statements.

    I.e.: INSERT INTO table (col1, col2) VALUES (val1, val2),(.., ..), ...

    If I recall correctly, you can have up to 4096 values per insertion statement.

  • Run a FLUSH TABLES command before you even start, to ensure that there are no pending disk writes that may hurt your insertion performance.

I think this will make things fast. I would suggest using LOCK TABLES, but I think disabling the keys makes that moot.

UPDATE

I realized after reading this that by disabling your keys you may remove consistency checks that are important for your file loading. You can fix this by:

  • Ensuring that your table has no data that "collides" with the new data being loaded (if you're starting from scratch, a TRUNCATE statement will be useful here).
  • Writing a script to clean your input data to ensure no duplicates locally. Checking for duplicates is probably costing you a lot of database time anyway.
  • If you do this, ENABLE KEYS should not fail.
Mike Axiak
A: 

I've ran into this problem myself before and nos pretty much got it right on the head, but you'll need to do a bit more to get it to perform the best.

I found that in my situation that I couldn't MySQL to accept one large INSERT statement, but found that if I split it up into groups of about 10k INSERTS at a time like how nos suggested then it'll do it's job pretty quickly. One thing to note is that when doing multiple INSERTs like this that you will most likely hit PHP's timeout limit, but this can be avoided by resetting the timout with set_time_limit($seconds), I found that doing this after each successful INSERT worked really well.

You have to be careful about doing this, because you could find yourself in a loop on accident with an unlimited timout and for that I would suggest testing to make sure that each INSERT was successful by either checking for errors reported by MySQL with mysql_errno() or mysql_error(). You could also catch errors by checking the number of rows affected by the INSERT with mysql_affected_rows(). You could then stop after the first error happens.

chrispen