ansaurus

Question

This code needs to loop over 3.5 million rows, how can I make it more efficient?

Answer 1

+2 A:

is this code going to die part way through on me? will my memory and max execution time be high enough?

Why don't you try and find out?

You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.

Note that MySQL supports delayed and multiple row insertion:

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

http://dev.mysql.com/doc/refman/5.1/en/insert.html

NullUserException 2010-08-04 04:16:39

Answer 2

+19 A:

I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.

2010-08-04 04:16:53

Ah the classic "I'm going to be downvoted for this". The tactic works with me: +1.

Artefacto 2010-08-04 04:19:07

And is it possible to set two Lines Terminated By?considering this maps directly into Load Data In File

Hailwood 2010-08-04 07:28:37

Answer 3

A:

You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.

Also, you should run this on the command line, where you won't need to worry about execution time limits.

The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.

deceze 2010-08-04 04:18:54

the csv is not in the right format, it is missing some columns.

Hailwood 2010-08-04 04:28:08

@Hailwood That shouldn't be a problem, see the examples in the manual: `LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata (col1,col2,...);`

deceze 2010-08-04 04:39:17

would you mind telling me what i need to write?the csv is in the format ofcode, (newline)code, (newline)code, (newline)code, (newline)but i need to insert in the format of ('', code, 0)

Hailwood 2010-08-04 05:53:44

Wouldn't it be quicker to manipulate the csv into something useful and import that.

Keyo 2010-08-04 05:55:02

@Hailwood Apply a default value of `''` and `0` to the first and third column respectively (in the table definition) and only insert the second column with `... INTO TABLE foo (second_column_name)`.

deceze 2010-08-04 05:57:17

please see update 2 above.

Hailwood 2010-08-04 06:31:13

Answer 4

+2 A:

I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.

2010-08-04 04:30:31

Answer 5

A:

2 possible ways.

1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.

2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.

iWantSimpleLife 2010-08-04 04:38:53

Answer 6

+1 A:

make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.

I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.

Matthew Lock 2010-08-04 06:52:09

you can also disable indexes

Cfreak 2010-08-04 22:03:53

ansaurus

tags:

views:

answers:

This code needs to loop over 3.5 million rows, how can I make it more efficient?

related questions