As was somewhat pointed out, you need to examine exactly where your bottleneck is and why you're using threading.
By moving to multiple threads, you do have a potential for increased performance. However, if you're updating the same DataTable with each thread, you're limited by the DataTable. Only one thread can write to the DataTable at one time (which you control with a lock), so you're still fundamentally processing in sequence.
On the other hand, most databases are designed for multiple connections, running on multiple threads, and have been highly tuned for that purpose. If you want to still use multiple threads: let each thread have its own connection to the database, and do its own processing.
Now, depending on the kind of processing going on, your bottleneck may be in opening and processing the file, and not in the database update.
One way to split things up:
- Put all the file names to be processed into a filename Queue.
- Create a thread (or threads) to pull an item off the filename Queue, open and parse and process the file, and push the results into a result Queue.
- Have another thread take the results from the result Queue, and insert them into the database.
These can run simultaneously... the database won't be updated until there's something to update, and will just wait in the meantime.
This approach lets you really know who is waiting on whom. If the read/process file part is slow, create more threads to do that. If the insert into database part is slow, create more threads to do that. The queues just need to be synchronized.
So, pseudocode:
Queue<string> _filesToProcess = new Queue<string>();
Queue<string> _results = new Queue<string>();
Thread _fileProcessingThread = new Thread( ProcessFiles );
Thread _databaseUpdatingThread = new Thread( UpdateDatabase );
bool _finished = false;
static void Main()
{
foreach( string fileName in GetFileNamesToProcess() )
{
_filesToProcess.Enqueue( fileName );
}
_fileProcessingThread.Start();
_databaseUpdatingThread.Start();
// if we want to wait until they're both finished
_fileProcessingThread.Join();
_databaseUpdatingThread.Join();
Console.WriteLine( "Done" );
}
void ProcessFiles()
{
bool filesLeft = true;
lock( _filesToProcess ){ filesLeft = _filesToProcess.Count() > 0; }
while( filesLeft )
{
string fileToProcess;
lock( _filesToProcess ){ fileToProcess = _filesToProcess.Dequeue(); }
string resultAsString = ProcessFileAndGetResult( fileToProcess );
lock( _results ){ _results.Enqueue( resultAsString ); }
Thread.Sleep(1); // prevent the CPU from being 100%
lock( _filesToProcess ){ filesLeft = _filesToProcess.Count() > 0; }
}
_finished = true;
}
void UpdateDatabase()
{
bool pendingResults = false;
lock( _results ){ pendingResults = _results.Count() > 0; }
while( !_finished || pendingResults )
{
if( pendingResults )
{
string resultsAsString;
lock( _results ){ resultsAsString = _results.Dequeue(); }
InsertIntoDatabase( resultsAsString ); // implement this however
}
Thread.Sleep( 1 ); // prevents the CPU usage from being 100%
lock( _results ){ pendingResults = _results.Count() > 0; }
}
}
I'm pretty sure there's ways to make that "better", but it should do the trick so you can read and process data while also adding completed data to the database, and take advantage of threading.
If you want another Thread to process files, or to update the database, just create a new Thread( MethodName ), and call Start().
It's not the simplest example, but I think it's thorough. You're synchronizing two queues, and you need to make sure each is locked before accessing. You're keeping track of when each thread should finish, and you have data being marshaled between threads, but never processed more than once, using Queues.
Hope that helps.