views:

103

answers:

2

i am inserting records to sql server from python using pymssql. The database takes 2 milliseconds to execute a query, yet it insert 6 rows per second. The only problem is at code side. how to optimize following code or what is the fastest method to insert records.

def save(self):
    conn = pymssql.connect(host=dbHost, user=dbUser, 
                           password=dbPassword, database=dbName, as_dict=True)
    cur = conn.cursor()

    self.pageURL = self.pageURL.replace("'","''")

    query = "my query is there"               
    cur.execute(query)

    conn.commit() 
    conn.close()
+4  A: 

It looks like you're creating a new connection per insert there. That's probably the major reason for the slowdown: building new connections is typically quite slow. Create the connection outside the method and you should see a large improvement. You can also create a cursor outside function and re-use it, which will be another speedup.

Depending on your situation, you may also want to use the same transaction for more than a single insertion. This changes the behaviour a little -- since a transaction is supposed to be atomic and either completely succeeds or completely fails -- but committing a transaction is typically a slow operation, because it has to be certain the whole operation succeeded.

Thomas Wouters
@Thomas Thanks It works. I have already tried by the connection object outside method but speed was still low. when set autocommit property to True, than the speed is improved.
+1  A: 

In addition to Thomas' great advice,
I'd suggest you look into executemany()*, e.g.:

cur.executemany("INSERT INTO persons VALUES(%d, %s)", 
    [ (1, 'John Doe'), (2, 'Jane Doe') ])

...where the second argument of executemany() should be a sequence of rows to insert.

This brings up another point:
You probably want to send your query and query parameters as separate arguments to either execute() or executemany(). This will allow the PyMSSQL module to handle any quoting issues for you.

*executemany() as described in the Python DB-API:

.executemany(operation,seq_of_parameters)
Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters.

Adam Bernier
thanks for the tip