views:

67

answers:

3

I am building php web application that let's a user upload a MS Access Database (csv export) that is then translated and migrated into a MySQL database.

The MS Access database consists of one table called t_product of 100k rows. This table is not designed well. As an example, the following query:

SELECT part_number, model_number FROM t_product

will return:

part_number model_number
100  AX1000, AX1001, AX1002
101  CZ10, CZ220, MB100

As you can see, the model numbers are listed as comma separated values instead of individual records in another table. There are many more issues of this nature. I'm writing a script to clean this data before importing into the mysql database. The script will also map existing Access columns to a proper relationally design database.

My issue is that my script takes too long to complete. Here's simplified code to explain what I'm doing:

$handle = fopen("MSAccess.csv, "r");

// get each row from the csv
while ($data=fgetcsv($handle, 1000, ","))
{
 mysql_query("INSERT INTO t_product (col1, col2 etc...) values ($data[0], $data[1], etc...");
 $prodId = mysql_last_insert_id();

 // using model as an example, there are other columns
 // with csv values that need to be broken up
 $arrModel = explode(',', $data[2]);
 foreach($arrModel as $modelNumber)
 mysql_query("INSERT INTO t_model (product_id, col1, col2 etc...) values ($prodId, $modelNumber[0], $modelNumber[1] etc...");
}

The problem here is that each while-loop iteration makes a tremendous number of calls to the database. For every product record, I have to insert N model numbers, Y part numbers, X serial numbers etc...

I started another approach where I stored the whole CSV in an array. I then write one batch query like

$sql = "INSERT INTO t_product (col1, col2, etc...) values ";
foreach($arrParam as $val)
 $sql .= " ($val[0], $val[1], $val[2]), "

But I ran into excessive memory errors with this approach. I increased the max memory limit to 64M and I'm still running out of memory.

What is the best way to tackle this problem?

Maybe I should write all my queries to a *.sql file first, then import the *.sql file into the mysql database?

+1  A: 

This may be entirely not the direction you want to go, but you can generate the MySQL creation script directly from MS Access with the free MySQL Migration Toolkit

Perhaps you could allow the user to upload the Access db, and then have your PHP script call the Migration toolkit?

dnagirl
The migration toolkit will also optimize its queries somewhat to speedup the bulk translation.
Ben S
A: 

If you're going to try optimizing the code you have there already, I would try aggregating the INSERTS and see if that helps. This should be easy to add to your code. Something like this (C# pseudocode):

int flushCount = 0;

while (!done)
{
    // Build next query, concatenate to last set of queries

    if (++flushCount == 5)
    {
        // Flush queries to database

        // Reset query string to empty

        flushCount = 0;
    }
}

// Flush remaining queries to the database
Jon Seigel
A: 

I decided to write all my queries into a .SQL file. This gave me the opportunity to normalize the CSV file into a proper relational database. Afterwards, my php script called an exec("mysql -h dbserver.com -u myuser -pmypass dbname < db.sql");

This solved my memory problems and it was much faster than multiple queries from php.

John