I use producer/consumer pattern with FileHelpers library to import data from one file (which can be huge) using multiple threads. Each thread is supposed to import a chunk of that file and I would like to use LineNumber property of the FileHelperAsyncEngine instance that is reading the file as primary key for imported rows. FileHelperAsyncEngine internally has an IEnumerator IEnumerable.GetEnumerator(); which is iterated over using engine.ReadNext() method. That internally sets LineNumber property (which seems is not thread safe).
Consumers will have Producers assiciated with them that will supply DataTables to Consumers which will consume them via SqlBulkLoad class which will use IDataReader implementation which will iterate over a collection of DataTables which are internal to a Consumer instance. Each instance of will have one SqlBulkCopy instance associate with it.
I have thread locking issue. Below is how I create multiple Producer threads. I start each thread afterwords. Produce method on a producer instance will be called determining which chunk of input file will be processed. It seems that engine.LineNumber is not thread safe and I doesn't import a proper LineNumber in the database. It seems that by the time engine.LineNumber is read some other thread called engine.ReadNext() and changed engine.LineNumber property. I don't want to lock the loop that is supposed to process a chunk of input file because I loose parallelism. How to reorganize the code to solve this threading issue?
Thanks Rad
for (int i = 0; i < numberOfProducerThreads; i++)
DataConsumer consumer = dataConsumers[i];
//create a new producer
DataProducer producer = new DataProducer();
//consumer has already being created
consumer.Subscribe(producer);
FileHelperAsyncEngine orderDetailEngine = new FileHelperAsyncEngine(recordType);
orderDetailEngine.Options.RecordCondition.Condition = RecordCondition.ExcludeIfBegins;
orderDetailEngine.Options.RecordCondition.Selector = STR_ORDR;
int skipLines = i * numberOfBufferTablesToProcess * DataBuffer.MaxBufferRowCount;
Thread newThread = new Thread(() =>
{
producer.Produce(consumer, inputFilePath, lineNumberFieldName, dict, orderDetailEngine, skipLines, numberOfBufferTablesToProcess);
consumer.SetEndOfData(producer);
});
producerThreads.Add(newThread); thread.Start();}
public void Produce(DataConsumer consumer, string inputFilePath, string lineNumberFieldName, Dictionary<string, object> dict, FileHelperAsyncEngine engine, int skipLines, int numberOfBufferTablesToProcess)
{
lock (this)
{
engine.Options.IgnoreFirstLines = skipLines;
engine.BeginReadFile(inputFilePath);
}
int rowCount = 1;
DataTable buffer = consumer.BufferDataTable;
while (engine.ReadNext() != null)
{
lock (this)
{
dict[lineNumberFieldName] = engine.LineNumber;
buffer.Rows.Add(ObjectFieldsDataRowMapper.MapObjectFieldsToDataRow(engine.LastRecord, dict, buffer));
if (rowCount % DataBuffer.MaxBufferRowCount == 0)
{
consumer.AddBufferDataTable(buffer);
buffer = consumer.BufferDataTable;
}
if (rowCount % (numberOfBufferTablesToProcess * DataBuffer.MaxBufferRowCount) == 0)
{
break;
}
rowCount++;
}
}
if (buffer.Rows.Count > 0)
{
consumer.AddBufferDataTable(buffer);
}
engine.Close();
}