views:

36

answers:

2

I am using ODP.NET for loading data into Oracle. I am bulking inserts into groups of a 1000 rows each call.

Is there any performance benefits in calling my load method asynchronously? So say I want to insert 10000 rows, instead of making 10 calls synchronously I make 10 calls asynchronously.

My database is using ASSM right now but otherwise plenty of freelists are used of course. The database server has several cores as well.

My initial tests seem to point to a performance increase, but maybe there is something I cannot see? Potential deadlock or contention issues?

Of course, there is added complexity in handling transactions and such doing my load this way.

+1  A: 

Hi Karl,

You should see increased performance (up to a limit) if you insert concurrently (i.e. by several sessions). You haven't described how you insert asynchronously but I suppose you open multiple connexions to the DB. If you have sufficient freelists (or you are using ASSM) the bottleneck should be the disk write speed. Commit only once per session (at the end of the workload) and you shouldn't run into contention issues.

You are already seeing a performance increase, only you can tell if it is worth the effort.

Vincent Malgrat
Yes, I open multiple connections to the DB. The performance benefit seems to be a bit spotty right now. I'm performing tests over the network so I need to be doing it locally to really see if there is any difference. Sadly it seems that I encounter a bit of ORA errors sometimes. Really hard to troubleshoot since I'm loading the same test-data over and over again. And some loads it complains about data type mismatches.
Karl Trumstedt
@Karl Trumstedt: it is hard to tell what happens without the precise error message and an example of your code. Try to determine the number of parallel sessions that performs best. You should observe diminishing returns as the number of sessions increases (and eventually decreased throughput if you create too many connections).
Vincent Malgrat
+1  A: 

Depends on where the bottleneck is.

If the bottleneck is CPU and you have multiple CPUs/cores, then multiple sessions can use multiple the processors which should speed things up.

If the bottleneck is network (between the 'app' box and the 'db' box) then the advantages are going to be less apparent. You can get some benefit (eg session 'a' hogs the network while session 'b' is using disk/cpu and vice versa).

Disk is somewhere in between. Disk writes are handled in the background by the DBWR process and the LGWR for the redo log. Again, if the sessions are using different resources at different times, you'll see some benefit.

You may get contention, especially on indexes on 'sequential' numbers or dates, where parallel sessions may want to write to the same bit of the index.

What sort of data volumes are you loading ? If it is very large, you may get a benefit from transferring files (eg CSV) over and uploading through an external table.

Gary