views:

275

answers:

3

I have the following code that processes a binary file. I want to split the processing workload by using threads and assigning each line of the binary file to threads in the ThreadPool. Processing time for each line is only small but when dealing with files that might contain hundreds of lines, it makes sense to split the workload.

My question is regarding the BinaryReader and thread safety. First of all, is what I am doing below acceptable. I have a feeling it would be better to pass only the binary for each line to the PROCESS_Binary_Return_lineData method.

Please note the code below is conceptual. I looking for a but of guidance on this as my knowledge of multi-threading is in its infancy. Perhaps there is a better way to achieve the same result, i.e. split processing of each binary line.

        var dic = new Dictionary<DateTime, Data>();        
        var resetEvent = new ManualResetEvent(false);

        using (var b = new BinaryReader(File.Open(Constants.dataFile, 
                            FileMode.Open, FileAccess.Read, FileShare.Read)))
        {
        var lByte = b.BaseStream.Length;
        var toProcess = 0;

        while (lByte >= DATALENGTH)
        {
            b.BaseStream.Position = lByte;
            lByte = lByte - AB_DATALENGTH;

            ThreadPool.QueueUserWorkItem(delegate
            {
                Interlocked.Increment(ref toProcess);
                var lineData = PROCESS_Binary_Return_lineData(b);

                lock(dic)
                {
                    if (!dic.ContainsKey(lineData.DateTime))
                    {
                     dic.Add(lineData.DateTime, lineData); 
                    }
                }

                if (Interlocked.Decrement(ref toProcess) == 0) resetEvent.Set();
            }, null);
        }
        }

        resetEvent.WaitOne();
A: 

"I have a feeling it would be better to pass only the binary for each line to the PROCESS_Binary_Return_lineData method."

yes, you need to do that, since your delegate may not get round to reading from the BinaryReader, before it gets repositioned

Mark Heath
+2  A: 

This doesn't look thread-safe to me. If you have more than one work item queued, and two of them happen to run at the same time, the reader's position could easily change between assignment and reading.

If you insist on using threads for this, you'd do better to read the data in your main thread and queue up the resulting byte arrays for reading. Any solution involving each thread reading from the file will involve locking, and at that point, you're not gaining anything at all from using threads.

cHao
would that be byte arrays derived by: - var LineBytes = b.ReadBytes(DATALENGTH);or would i have to clone the data to a byte array to make sure it is no longer referencing the filestream.
washtik
You should just be able to use the result from b.ReadBytes(DATALENGTH).
cHao
+1  A: 

It very rarely makes sense to use threads to improve file processing performance. A thread, when run on a multi-core CPU, provides more CPU cycles. This is rarely the resource you are short of when processing files. You need more disks. Not an option of course.

Smoke test this first. Reboot your machine so the file won't be stored in the file system cache. Run your single-threaded program and observe the CPU load. Taskmgr.exe, Performance tab is good for that. If you don't see one CPU maxed-out at 100% load then adding another CPU cannot make your program quicker.

Hans Passant