tags:

views:

75

answers:

2

Hello.

I am having problems with a Python script which is basically just analysing a CSV file line-by-line and then inserting each line into a MySQL table using a FOR loop:

f = csv.reader(open(filePath, "r"))
i = 1
for line in f:
    if (i > skipLines):
        vals = nullify(line)
        try:
            cursor.execute(query, vals)
        except TypeError:
            sys.exc_clear()
    i += 1
return

Where the query is of the form:

query = ("insert into %s" % tableName) + (" values (%s)" % placeholders)

This is working perfectly fine with every file it is used for with one exception - the largest file. It stops at different points each time - sometimes it reaches 600,000 records, sometimes 900,000. But there are about 4,000,000 records in total.

I can't figure out why it is doing this. The table type is MyISAM. Plenty of disk space available. The table is reaching about 35MB when it stops. max_allowed_packet is set to 16MB but I don't think is a problem as it is executing line-by-line?

Anyone have any ideas what this could be? Not sure whether it is Python, MySQL or the MySQLdb module that is responsible for this.

Thanks in advance.

+1  A: 

Hi,

Have you tried LOAD MySQL function?

query = "LOAD DATA INFILE '/path/to/file' INTO TABLE atable FIELDS TERMINATED BY ',' ENCLOSED BY '\"' ESCAPED BY '\\\\'"
cursor.execute( query )

You can always pre-process the CSV file (at least that's what I do :)

Another thing worth trying would be bulk inserts. You could try to insert multiple rows with one query:

INSERT INTO x (a,b)
VALUES 
('1', 'one'),
('2', 'two'),  
('3', 'three')

Oh, yeah, and you don't need to commit since it's the MyISAM engine.

Dave
I was aware of `LOAD` but decided against using it, thinking it would be better to process and insert one row at a time. However, now that you mention it I think I prefer your solution; MySQL processing will be reduced and it makes it easier to keep atomicity. Also makes archiving easier as I can just put the modified CSVs somewhere. Thanks :)
edanfalls
A: 

As S. Lott alluded to, aren't cursors used as handles into transactions?

So at any time the db is giving you the option of rolling back all those pending inserts.

You may simply have too many inserts for one transaction. Try committing the transaction every couple of thousand inserts.

tullaman
Yes, I did try this to make sure. It turned out that I had autocommit on anyway, so every MySQL query was automatically wrapped in its own transaction.
edanfalls