views:

209

answers:

8

I'm creating a windows console application that will read text file line by line and extract the data from the string that is fixed length data. The application is written as windows application for now but will convert to windows console app later on. I've notice that it take a while for the application to run from reading the text, inserting into the database and exporting out of the database.

Would it help speed up the process if i use multiple threads? I'm thinking one thread to read the data and another thread to do inserting the data to the database.

any suggestion?

edit: the application is going to be done in VB.net

+1  A: 

It's impossible to say in general - the only way to find out is to build the app and test the performance. The bottleneck is likely to be the DB insert, but whether multi-threading will speed thibngs up depennds on a host of factors:

  • are your app and the db server running on thge same machine?
  • do they use the same disk?
  • can one insert cause contention with another?

You get the idea. Having said that, I have written servers in the finance industry where multi-threading the DB access did make a huge difference. But these were talking to a gigantic Sun enterprise server which had database I/Os to spare, so flooding it with requests from a multi-threaded app made sense.

anon
+1  A: 

With multiple threads, you may be able to get some overlap - one thread is reading from disk while another thread is doing a database insert. I'm guessing that you probably won't see that much of an improvement - unless you're reading very large files, most of your time is probably spent inserting into the database, and the time in disk I/O is just noise.

Michael
A: 

You probably wouldn't gain much from that, as the task you're outlining here is pretty much sequential in nature.

Joey
A: 

What are you using to build windows app? If you are using .Net use thread pool. There is nice library called Power threading developed by Jeff Richter.Download

Also, understand how threads work in windows OS. Adding multiple threads sometimes may not help and I often not encourage it.

CodeToGlory
A: 

You won't know if multithreading will help until you build the application, but it seems that you really just want better performance. Before doing anything you need to measure the performance of the application. Perhaps there is some code that is inefficient, so use a profiler to identify bottlenecks.

BrianLy
A: 

multiple Threads not always improve the performance. If the activities can truly be executed in parallel then only the basic multithreading works. If lots of IO operations are being done in reading data then its worth to give a try. Best way is to prototype and verify.

aJ
+4  A: 

I will assume this is an SQL database.

Your problem is likely to be that you are doing one item at a time. SQL hates that. SQL and SQL databases operate on sets of items.

So, open a transaction, read and insert 1,000 items. Save those items in case the transaction commit fails for some reason so that you can retry.

I have managed to speed up some Perl scripts doing work that sounds similar to your description by over 20x with this technique.

I do not know the Microsoft library that you are using, but here is a sample in Perl using DBI. The parts that make it work are AutoCommit => 0 and $dbh->commit.

#!/usr/bin/perl

use strict;
use DBI;

my $dbname = 'urls';
my $user = 'postgres';
my $pass = '';

my $dbh = DBI->connect(
    "DBI:Pg:dbname=$dbname",
    $user,
    $pass,
    { 'RaiseError' => 1, AutoCommit => 0 }
);

my $insert = $dbh->prepare('
    INSERT INTO todo (domain, path)
    VALUES (?, ?)
');

my $count = 0;
while(<>) {
    if( $count++ % 1000 == 0) {
     $dbh->commit;
    }
    chomp;
    my ($one, $two) = split;
    $insert->execute($one, $two);
}
$dbh->commit;
$dbh->disconnect;
Zan Lynx
Can you give me some more detail on how to do with set of items? A tutorial or example will be great. BTW, I'm using Microsoft Enterprise library to insert the data. Thank.
Jack
+1  A: 

Comitting data to the database is a time intensive operation. Try collecting items in batches (say 1000) and submit these batches to the database rather than submitting the items one by one. This should improve your performance. Multithreading is overkill for this type of application.

adeel825