ansaurus

Question

Faster Insertion of Records into a Table with SQLAlchemy

Answer 1

+2 A:

Without knowing the table engine (MyISAM? InnoDB?), schema, and indexes, it's hard to comment on specifics between the two databases you're using there.

However, when using MySQL like this, you will likely find that it is far faster to write your data out to a temporary text file and then use the LOAD DATA INFILE syntax to load it all into your database. It looks like you can call the execute method on your connection object to run the SQL necessary to do this.

Further, if you are dead set on adding things row by row, and you're recreating the table every time, you can verify key constraints in your program and add those constraints only after all rows have been inserted, saving the DB the time of doing constraints checks on every insert.

Conspicuous Compiler 2010-05-21 12:27:15

"you can verify key constraints in your program and add those constraints only after all rows have been inserted, saving the DB the time of doing constraints checks on every insert." Can you break that down a little for me, that part went over my head :-P

Kyle Brandt 2010-05-21 12:43:46

@Kyle It's hard to give specifics without a table schema to work from. But, for example, if you have any UNIQUE indexes, that uniqueness is a constraint on the table. Every time you insert a row, the database insures that there isn't another row which conflicts with that one. Since you have only one table, you don't have to worry about foreign key constraints, but if you should add those later, this would apply to them as well.

Conspicuous Compiler 2010-05-21 15:02:39

Answer 2

+2 A:

The big thing you should try is putting a transaction around multiple inserts since it is the committing of the database to disk that really takes a long time. You'll need to decide the batching level, but a crude first attempt would be to wrap a transaction around the whole lot.

Donal Fellows 2010-05-21 12:28:02

So something like an array of the ins objects I create, and then execute when the array is full? Or is that not really what you mean?

Kyle Brandt 2010-05-21 12:32:36

@Kyle: You'll need to create a transaction with transaction = session.create_transaction(); and then later do a transaction.commit . See http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html and scroll down to the "Transactions" heading.

Conspicuous Compiler 2010-05-21 12:45:05

Not actually what I did, but this was the idea behind what I did and like to give people points :-). Explained what I did below.

Kyle Brandt 2010-05-26 19:32:09

Answer 3

A:

I did the following to achieve some batching:

inserts = []
insert_every = 1000
for line in file_to_parse:
    m = line_regex.match(line)
    if m:
        fields = m.groupdict()
        if use_sql: #This uses Globals, Ick :-/
            inserts.append(pythonified)
            if (parsed % insert_every) == 0:
                connection.execute(log_table.insert(), inserts)
                inserts = []
            parsed += 1
if use_sql:
    if len(inserts) > 0:
        connection.execute(log_table.insert(), inserts)

This doesn't use transactions, but in a very lazy manner it allowed me to turn the insert/parse stage from ~13 seconds to about ~2 seconds with mysql backend using a smaller sample. I will see what the difference between mysql and sqlite is now with this change using the full sample.

I found the basic information for this here.

Results:
Engine:Non-Grouped Insert Time in Minutes: Grouped Insert Time in Minutes
Sqlite:61:8
MySql:15:2.5

I didn't flush my cache between the mysql and sqlite which would have had the source text file possibly, but I don't think that would be a relatively significant difference.

Kyle Brandt 2010-05-21 16:08:03

ansaurus

tags:

views:

answers:

Faster Insertion of Records into a Table with SQLAlchemy

related questions