views:

250

answers:

5

What is the fastest method to fill a database table with 10 Million rows? I'm asking about the technique but also about any specific database engine that would allow for a way to do this as fast as possible. I"m not requiring this data to be indexed during this initial data-table population.

A: 

Use MySQL or MS SQL and embedded functions to generate records inside the database engine. Or generate a text file (in cvs like format) and then use Bulk copy functionality.

Elalfer
+2  A: 

The fastest method is probably running an INSERT sql statement with a SELECT FROM. I've generated test data to populate tables from other databases and even the same database a number of times. But it all depends on the nature and availability of your own data. In my case i had enough rows of collected data where a few select/insert routines with random row selection applied half-cleverly against real data yielded decent test data quickly. In some cases where table data was uniquely identifying i used intermediate tables and frequency distribution sorting to eliminate things like uncommon names (eliminated instances where a count with group by was less than or equal to 2)

Also, Red Gate actually provides a utility to do just what you're asking. It's not free and i think it's Sql Server-specific but their tools are top notch. Well worth the cost. There's also a free trial period.

If you don't want to pay or their utility you could conceivably build your own pretty quickly. What they do is not magic by any means. A decent developer should be able to knock out a similarly-featured though alpha/hardcoded version of the app in a day or two...

Paul Sasik
+1  A: 

You might be interested in the answers to this question. It looks at uploading a massive CSV file to a SQL server (2005) database. For SQL Server, it appears that a SSIS DTS package is the fastest way to bulk import data into a database.

MagicAndi
+2  A: 

Using SQL to load a lot of data into a database will usually result in poor performance. In order to do things quickly, you need to go around the SQL engine. Most databases (including Firebird I think) have the ability to backup all the data into a text (or maybe XML) file and to restore the entire database from such a dump file. Since the restoration process doesn't need to be transaction aware and the data isn't represented as SQL, it is usually very quick.

I would write a script that generates a dump file by hand, and then use the database's restore utility to load the data.

After a bit of searching I found FBExport, that seems to be able to do exactly that - you'll just need to generate a CSV file and then use the FBExport tool to import that data into your database.

gooli
+1  A: 

It entirely depends on your DB. For instance, Oracle has something called direct path load (http://download.oracle.com/docs/cd/B10501%5F01/server.920/a96652/ch09.htm), which effectively disables indexing, and if I understand correctly, builds the binary structures that will be written to disk on the -client- side rather than sending SQL over.

Combined with partitioning and rebuilding indexes per partition, we were able to load a 1 billion row (I kid you not) database in a relatively short order. 10 million rows is nothing.

James