views:

147

answers:

3

I am currently adding hundreds of thousands of rows of data to a table first on a MS Access Table then on a MySQL Table.

I first tried with MS Access, it tooks less than 40 seconds. Then I tried exactly with the same source and with the same table structure to MySQL and it took 6 min and 40 seconds. That is 1000% slower!!!

So is it a myth that a database server has better performance?

+1  A: 

Usually the most important performance aspect with databases is not how quickly you can insert data, but how quickly you can query it. MySQL has, I believe, a more powerful optimizer than MS Access and can make better use of indexes. An example of this is the loose index scan which can give a factor 10 or more speed up for certain types of queries.

Also, the method you use to insert data can have an effect on the time it takes to insert. For example it will typically be faster to use a bulk insert compared to lots of individual insert statements. Also disabling indexes when inserting and enabling them again afterwards can improve performance.

Mark Byers
You're right and sure when I will have finished uploading I will compare query speed.If I wasn't limited by the 2 Go MS Access size I wouldn't have used MySQL. In fact if even query is much slower I would even have interest to use multiple MS Access DB of 2 Go each and consolidate the results.Thanks for the index suggestion, will try.
Of course Msaccess isn't suited for everything but I'm surprised by the huge gap of performance at least for insert on a very simple table.
@asksuperu: Sure, that sounds like a good idea. But please before you go round saying that database X sucks compared to database Y for queries, post full information for your testing so others can reproduce your results and more importantly review and critique your methodology. A flawed test with no peer review is what causes these myths in the first place.
Mark Byers
Well my purpose is not really to make a comparison it's to get a real world job's done which is a rather widespread use case :)So my purpose is rather to find the best solution to this, not really to compare but as a side effect it's a comparison.
+1  A: 

Does MySQL provide any SQL trace tools so you can see what Access is sending it? From my experience with using Access with SQL Server via ODBC I can tell you that Jet makes some seemingly strange decisions with bulk inserts. What it does is send an insert for each record, instead of a batch insert for all the records. This makes it massively slower, but it does mean that it can't tie up the SQL Server with a long update (and corresponding table locks, etc.).

It's dumb from the standpoint of your insert, but smart from the standpoint of being a good client/server citizen -- it's allowing the SQL Server to decide how to serialize the requested commands and interleave them with those from other users. This means locks are shorter than they would be on a bulk insert.

With SQL Server, you can use ADO to do the trick and force it to process the insert as a batch. I don't know if there's any way to do that with MySQL.

One thing to consider:

If the source and destination tables are both in MySQL, a passthrough query should cause it to be handled entirely by MySQL.

David-W-Fenton
Probably MySQL has trace tools but I'm really not specialist of them. Your remark is interesting I intended to also try SQL Server but since you seems to remark it would also be slow, it wouldn't be more worth than MySQL.So good idea sql passthrough would be better or bulk insert as barry suggests.
could you detail "you can use ADO to do the trick and force it to process the insert as a batch" ? I'm currently using DAO and not accustomed with ADO.
Source is Multiple MS Access files (1 per week, 250 Mb each) I have no control upon this format as it comes from another department.
ADO offers a batch mode that DAO lacks, so there's no shame in using ADO for that particular purpose. I'd have to look up how to do it, but it would surely be some parameter for your connection object or for the execute method.
David-W-Fenton
With multiple Access files as the source being appended into MySQL, perhaps MySQL can load read data from ODBC data sources. If so, then you might be able to use the IN statement in your SQL to have MySQL retrieve all the data on its side of the operation (instead of letting Jet break it down into individual statements.
David-W-Fenton
One thing you could do if you switched to SQL Server was to mount the Access files as linked server in SQL Server, and then you'd be able to do it all with a passthrough and let SQL Server sort it out. MySQL 5 has a form of linked servers, but if I'm remembering correctly, it's limited to using other MySQL servers and can't use other data sources.
David-W-Fenton
+3  A: 

Executing thousands of independent INSERTs is going to run very slowly. Since MySQL is a multi-user, transactional database, there is a lot more going on during each query than Access does. Each INSERT operation on a SQL server goes through the following steps:

  1. Decode and parse the query.
  2. Open the table for writing, establishing locks if necessary.
  3. Insert the new row.
  4. Update the indexes, if necessary.
  5. Save the table to disk.

Ideally, you want to perform steps 1, 2, 4, and 5 as few times as possible. MySQL has some features that will help you.

PREPARE your queries

By preparing a query that you are going to use repeatedly, you perform step 1 just once. Here's how:

PREPARE myinsert FROM 'INSERT INTO mytable VALUES (?, ?, ?)';
SET @id = 100;
SET @name = 'Joe';
SET @age = 34;
EXECUTE myinsert USING @id, @name, @age;
SET @id = 101;
SET @name = 'Fran';
SET @age = 23;
EXECUTE myinsert USING @id, @name, @age;
# Repeat until done
DEALLOCATE PREPARE myinsert; 

Read more about PREPARE at the mysql.com site.

Use transactions

Combine several (or several hundred) INSERTs into a transaction. The server only has to do steps 2, 4, and 5 once per transaction.

PREPARE myinsert FROM 'INSERT INTO mytable VALUES (?, ?, ?)';

START TRANSACTION;
SET @id = 100;
SET @name = 'Joe';
SET @age = 34;
EXECUTE myinsert USING @id, @name, @age;
SET @id = 101;
SET @name = 'Fran';
SET @age = 23;
EXECUTE myinsert USING @id, @name, @age;
# Repeat a hundred times
COMMIT;

START TRANSACTION;
SET ...
SET ...
EXECUTE ...;
# Repeat a hundred times
COMMIT;

# Repeat transactions until done

DEALLOCATE PREPARE myinsert;

Read more about transactions.

Load your table from a file

Instead of doing thousands of INSERTS, do one batch upload of your data. If your data is in a delimited file, such as a CSV, use the LOAD DATA statement.

LOAD DATA LOCAL INFILE '/full/path/to/file/mydata.csv' INTO TABLE `mytable` FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n';

Here's a link to the MySQL page on LOAD DATA.

Barry Brown
Thank you for this thorough tutorial, I will try.In fact I have no choice than use MySQL.Just a remark: MS Access can also support Multi-User (theorically up to 255, practically up to around 20 is ok) and even transactions - In fact I have also put transaction in place. I rather think the slowness is due 1°) to odbc and 2°) to the fact that MS Access is a local file system that doesn't need to use tcp/ip protocol layer to access a server. I should test with Oracle also when I would have time but the free version of Oracle is almost as limited as Access so is no more interesting.