I hope you will bear with me. I wanted to provide as much information as I can. The main problem is how to create a structure (like a stack) that will be used by multiple threads that will pop a value and use it to process one big flat file and possibly do cycling again and again until the whole file is processed. If a file has 100.000 records that can be processed by 5 threads using 2.000 row chunks then each thread will get 10 chunks to process.
My goal is to move data in a flat file (with Header...Subheader...Detail, Detail, Detail, ...Detail, SubFooter, Subheader...Detail, Detail, Detail, ...Detail, SubFooter, Subheader...Detail, Detail, Detail, ...Detail, SubFooter, Footer structure) into OLTP DB that has recovery mode to Simple (possible Full) into 3 tables: 1st representing Subheader's unique key present in Subheader row, 2nd an intermediate table SubheaderGroup, representing grouping of detail rows in chunks of 2000 records (needs to have Subheader's Identity PK as its FK and 3rd representing Detail rows with FK pointing to Subheader PK.
I am doing manual transaction management since I can have tens of thousands of Detail rows and I am using a special field that is set to 0 in destination tables during the load and then at the end of file processing I am doing a transactional upate changing this value to 1 which can signal other application that the loading finished.
I want to chop this flat file into multiple equal pieces (same number of rows) that can be processed with multiple threads and imported using SqlBulkCopy using IDataReader that is created from Destination table metadata).
I want to use producer/consumer pattern (as explained in link below - pdf analysis and code sample) to use SqlBulkCopy with SqlBulkCopyOptions.TableLock option. http://sqlblog.com/blogs/alberto_ferrari/archive/2009/11/30/sqlbulkcopy-performance-analysis.aspx This pattern enables creating multiple producers and the equivalent number of consumers need to subscribe to producers to consume the row.
In TestSqlBulkCopy project, DataProducer.cs file there is a method that simulates production of thousands of records.
public void Produce (DataConsumer consumer, int numberOfRows) {
int bufferSize = 100000;
int numberOfBuffers = numberOfRows / bufferSize;
for (int bufferNumber = 0; bufferNumber < numberOfBuffers; bufferNumber++) {
DataTable buffer = consumer.GetBufferDataTable ();
for (int rowNumber = 0; rowNumber < bufferSize; rowNumber++) {
object[] values = GetRandomRow (consumer);
buffer.Rows.Add (values);
}
consumer.AddBufferDataTable (buffer);
}
}
This method will be executed in the context of a new thread. I want this new thread to read only a unique chunk of original flat file and another thread will strart processing the next chunk. Consumers would then move data (that is pumped to them) to SQL Server DB using SqlBulkCopy ADO.NET class.
So the question here is about main program dictating what lineFrom to lineTo should be processed by each thread and I think that should happen during thread creation. Second solution is probably for threads to share some structure and use something unique to them (like thread number or sequence number) to lookup a shared structure (possibly a stack and pop a value (locking a stack while doing it) and then next thread will then pickup the next value. The main program will pick into the flat file and determine the size of chunks and created the stack.
So can somebody provide some code snippets, pseudo cod on how multiple threads would process one file and only get a unique portion of that file?
Thanks, Rad