views:

64

answers:

3

Scenario

I want to parse a large CSV file and inserts data into the database, csv file has approximately 100K rows of data.

Currently I am using fgetcsv to parse through the file row by row and insert data into Database and so right now I am hitting database for each line of data present in csv file so currently database hit count is 100K which is not good from performance point of view.

Current Code:

public function initiateInserts()
{
    //Open Large CSV File(min 100K rows) for parsing.
    $this->fin = fopen($file,'r') or die('Cannot open file');

    //Parsing Large CSV file to get data and initiate insertion into schema.
    while (($data=fgetcsv($this->fin,5000,";"))!==FALSE)
    {
        $query = "INSERT INTO dt_table (id, code, connectid, connectcode) 
                 VALUES (:id, :code, :connectid, :connectcode)";

        $stmt = $this->prepare($query);
        // Then, for each line : bind the parameters
        $stmt->bindValue(':id', $data[0], PDO::PARAM_INT);
        $stmt->bindValue(':code', $data[1], PDO::PARAM_INT);
        $stmt->bindValue(':connectid', $data[2], PDO::PARAM_INT);
        $stmt->bindValue(':connectcode', $data[3], PDO::PARAM_INT);

        // Execute the statement
        $stmt->execute();
        $this->checkForErrors($stmt);
    }
}

I am looking for a way wherein instead of hitting Database for every row of data, I can prepare the query and than hit it once and populate Database with the inserts.

Any Suggestions !!!

Note: This is the exact sample code that I am using but CSV file has more no. of field and not only id, code, connectid and connectcode but I wanted to make sure that I am able to explain the logic and so have used this sample code here.

Thanks !!!

A: 

I cannot offer the solution since i dont do PHP.

But you can store those strings in one array and send data to the BD in bulks of 100 or so.

Something like Conn.Execute(assembledInsert) where assembledInsert is the ToString of the array of inserts.

Keep a connection open in the whole process and use only this one.

jaderanderson
I am not sure of .Net and so hard for me to understand Conn.Execute concept but `$data` is the array and not string and so would appreciate if you elaborate little more on your approach as it is hard to understand now.
Rachel
I mean using a array composed of many $query that already are asembled.The $arr array stores the resulting querys of your loop described in the question.$stmt = $this->prepare($buff); //Assume that $buff is a concatenation of 100 positions of the $arr array// Execute the statement $stmt->execute(); $this->checkForErrors($stmt);
jaderanderson
This should reduce BD calls. 100:1
jaderanderson
The Numenor solution approaches mine. Mine is only a more "batch" approach using 100 insert per DB access
jaderanderson
I will try it out and let you know if it is appropriate one.
Rachel
@jaderanderson: I tried using this approach and am trying to use execute outside `while loop` but it is taking long time to prepare the query and memory is being exhausted several time, even I tried increasing the memory size to 100MB in php but then again it is taking very long time to prepare the query itself. Other approach I am trying is to see if I can prepare query outside `while loop` and than inside loop I can call `$stmt = $this->execute()` but it is not at all inserting records into database.Refer: `http://stackoverflow.com/questions/2435390/recursively-mysql-query/2435567#2435567`
Rachel
Rachel: The problem is that you are sending too much inserts in a prepare, you see? I'm kinda new here so i cannot format something beatiful but hear me out: Inside the main loop use a counter, name it T or whatever. When this counter reaches 100, put the "execute, prepare" kind of commands inside the if. Fill the $query with "" and end the IF block. Get it? Doing this will make the prepare and execute lines send batches of 100 insert, not 100k like numenors.EDIT: Let me know how it worked, otherwise i'll send you another answer with some formatting :D
jaderanderson
+1  A: 

on your while loop only build query string, and execute the statement out of loop. So something like this should work(not sure about syntax since its been a long time i have written php but it should work:

public function initiateInserts()
{
    //Open Large CSV File(min 100K rows) for parsing.
    $this->fin = fopen($file,'r') or die('Cannot open file');

    //Parsing Large CSV file to get data and initiate insertion into schema.
    $query = "";
    while (($data=fgetcsv($this->fin,5000,";"))!==FALSE)
    {
        $query = $query . "INSERT INTO dt_table (id, code, connectid, connectcode) 
                 VALUES (" . $data[0] . ", " . $data[1] . ", " . $data[2] . ", " . $data[3] . ")";
    }
     $stmt = $this->prepare($query);
     // Execute the statement
     $stmt->execute();
     $this->checkForErrors($stmt);
}
Numenor
Do this how would it prepare query for each row in the csv file and not sure why did we modified `:$data[0]` to `+$data[0]+` any suggesionts ?
Rachel
because i just built a query string without using any php specific object. Added parameters while building the querystring without using $stmt object thats why i used $data[0] i think ":" is used to show that its variable but the way i change it its not variable anymore its the value. "$query = $query + ..." this helps to concatenate the strings.
Numenor
If I do this then will this query not have fears of SQL Injection ?
Rachel
it will, since its a text file i didnt worry about it but if you need to you should change it.
Numenor
@Numenor: I am trying to take this approach but when I do this prepare statement is taking very long time and also my memory was exhausted and even if I increase it to 100MB than also it is not executing and so am not sure how well this approach would scale. Are there any other approaches which we can implement to get solution. Also would like to bring this approach to attention : `http://stackoverflow.com/questions/2435390/recursively-mysql-query/2435567#2435567`
Rachel
i corrected the concat operator. jaderanderson's approach is also nice maybe execute every 100/1000/10000(whichever suits your memory needs) inserts so you do not build a huge string to operate on.
Numenor
@Numenor: Can you elaborate in code, implementation of storing query in an array and hitting database every 100/1000 inserts ?
Rachel
A: 

I think the point of prepared statements - besides security in query building - is "prepare once, execute many": prepared statements are meant to be run multiple times.

So you can try and take the prepare statement out of the loop and keep only parameter binding and execution in it.

kemp