Taking into account the concerns voiced by others about attempting parallel I/Os against a single disk device, it does look like your processing model can be broken up into parts. You have a Filesources.Keys input list, and the output appears to be simply appending the compute results to aggregatedDefinitions.
Here's how you can break that up for processing on multiple threads and preserve the order of your current results:
First, decide how many threads you are going to use. For compute intensive tasks, there is usually no point in spinning up more threads than you have CPU cores. For I/O bound tasks, you can use more threads than CPU cores since the threads will be spending most of their time waiting for I/O completion.
Let's assume your DefinitionLookup is compute intensive, not I/O intensive, and let's assume you're running on a dual core CPU. In these conditions, two threads would be a good choice.
Next, break the input up into largish chunks, preserving the order of the inputs. For our two thread scenario, send the first half of the FileSources.Keys list to the first thread, and the second half to the second thread.
In each thread, process the inputs as before, but append the output to a local list object, not the final (shared) aggregatedDefinitions list.
After the threads have finished their processing, have the main thread concatenate each thread's list results into the final aggregatedDefinitions list, in the correct order. (Thread 1 that received the first half of the inputs produces list1, and should be appended to the master list before the output of Thread2's results.
Something like this:
static void Mainthread()
{
List<string> input = new List<string>(); // fill with data
int half = input.Count() / 2;
ManualResetEvent event1 = new ManualResetEvent(false);
List<string> results1 = null;
// give the first half of the input to the first thread
ThreadPool.QueueUserWorkItem(r => ComputeTask(input.GetRange(0, half), out results1, event1));
ManualResetEvent event2 = new ManualResetEvent(false);
List<string> results2 = null;
// second half of input to the second thread
ThreadPool.QueueUserWorkItem(r => ComputeTask(input.GetRange(half + 1, input.Count() - half), out results2, event2));
// wait for both tasks to complete
WaitHandle.WaitAll(new WaitHandle[] {event1, event2});
// combine the results, preserving order.
List<string> finalResults = new List<string>();
finalResults.AddRange(results1);
finalResults.AddRange(results2);
}
static void ComputeTask(List<string> input, out List<string> output, ManualResetEvent signal)
{
output = new List<string>();
foreach (var item in input)
{
// do work here
output.Add(item);
}
signal.Set();
}
Also, even if all the I/O activity is hitting one disk drive, you could get some performance benefit using asynchronous file reads. The idea is you could issue the next file read request as soon as you receive the data from the previous file read request, process the data of the previous read, then wait for the completion of the next file read. This allows you to use the CPU for processing while the disk I/O request is being handled, without explicitly using threads yourself.
Compare these (pseudo) execution timelines to read and process 4 chunks of data. Assume a file read takes about 500 time units to complete, and processing that data takes about 10 time units.
Synchronous file I/O:
read (500)
process data (10)
read (500)
process data (10)
read (500)
process data (10)
read (500)
process data (10)
Total time: 2040 time units
Async file I/O:
begin async read 1
async read 1 completed (500)
begin async read 2 / proces data 1 (10)
async read 2 completed (500)
begin async read 3 / proces data 2 (10)
async read 3 completed (500)
begin async read 4 / proces data 3 (10)
async read 4 completed (500)
process data 4 (10)
Total time: 2010 time units
The processing of data 1, 2 and 3 happens during the time that the next read request is pending, so compared to the first execution timeline, you get the processing time essentially for free. The processing of the last data chunk adds to the total time because there is no read operation for it to run concurrent with.
The scale of these operations (500 for I/O, 10 for compute) is conservative. Real I/O's tend to be even larger compared to compute time, many orders of magnitude higher than compute. As you can see, when the compute operation is pretty quick you don't get a lot of performance benefit out of all this work.
You can get greater value out of the effort of doing async I/O's if what you're doing in the "free" time is substantial. Cryptography or image processing, for example, would likely be a win but string concatenation probably would not be worth it. Writing data to another file could be worthwhile in the async overlap, but as others have noted the benefits will be diminished if all I/O is on the same physical device.