views:

123

answers:

3

I have an application I have already started working with and it seems I need to rethink things a bit. The application is a winform application at the moment. Anyway, I allow the user to input the number of threads they would like to have running. I also allow the user to allocate the number of records to process per thread. What I have done is loop through the number of threads variable and create the threads accordingly. I am not performing any locking (and not sure I need to or not) on the threads. I am new to threading and am running into possible issue with multiple cores. I need some advice as to how I can make this perform better.

Before a thread is created some records are pulled from my database to be processed. That list object is sent to the thread and looped through. Once it reaches the end of the loop, the thread call the data functions to pull some new records, replacing the old ones in the list. This keeps going on until there are no more records. Here is my code:

private void CreateThreads()
{
    _startTime = DateTime.Now;
    var totalThreads = 0;
    var totalRecords = 0;
    progressThreadsCreated.Maximum = _threadCount;
    progressThreadsCreated.Step = 1;
    LabelThreadsCreated.Text = "0 / " + _threadCount.ToString();
    this.Update();
    for(var i = 1; i <= _threadCount; i++)
    {
        LabelThreadsCreated.Text = i + " / " + _threadCount;
        progressThreadsCreated.Value = i;
        var adapter = new Dystopia.DataAdapter();
        var records = adapter.FindAllWithLocking(_recordsPerThread,_validationId,_validationDateTime);
        if(records != null && records.Count > 0)
        {
            totalThreads += 1;
            LabelTotalProcesses.Text = "Total Processes Created: " + totalThreads.ToString();




            var paramss = new ArrayList { i, records };
            var thread = new Thread(new ParameterizedThreadStart(ThreadWorker));
            thread.Start(paramss); 
        }

        this.Update();
    }


}


private void ThreadWorker(object paramList)
{
    try
    {
        var parms = (ArrayList) paramList;
        var stopThread = false;
        var threadCount = (int) parms[0];
        var records = (List<Candidates>) parms[1];
        var runOnce = false;
        var adapter = new Dystopia.DataAdapter();
        var lastCount = records.Count;
        var runningCount = 0;
        while (_stopThreads == false)
        {
            if (!runOnce)
            {
                CreateProgressArea(threadCount, records.Count);
            }
            else
            {
                ResetProgressBarMethod(threadCount, records.Count);
            }


            runOnce = true;
            var counter = 0;
            if (records.Count > 0)
            {
                foreach (var record in records)
                {
                    counter += 1;
                    runningCount += 1;
                    _totalRecords += 1;
                    var rec = record;
                    var proc = new ProcRecords();
                    proc.Validate(ref rec);

                    adapter.Update(rec);

                    UpdateProgressBarMethod(threadCount, counter, emails.Count, runningCount);

                    if (_stopThreads)
                    {
                        break;
                    }
                }

                UpdateProgressBarMethod(threadCount, -1, lastCount, runningCount);

                if (!_noRecordsInPool)
                {
                    records = adapter.FindAllWithLocking(_recordsPerThread, _validationId, _validationDateTime);
                    if (records == null || records.Count <= 0)
                    {
                        _noRecordsInPool = true;
                        break;
                    }
                    else
                    {
                        lastCount = records.Count;
                    }
                }
            }
        }
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }
}
A: 

The obvious answer is to question why you want threads in the first place? Where is the analysis and benchmarks that show that using threads will be an advantage?

How are you ensuring that non-gui threads do not interact with the gui? How are you ensuring that no two threads interact with the same variables or datastructures in an unsafe way? Even if you realise you do need to use locking, how are you ensuring that the locks don't result in each thread processing their workload serially, removing any advantages that multiple threads might have provided?

Arafangion
I need multiple threads becuase we are validating records and making netwrok connections to validate that data. We are dealing with millions of records in our databases and only having one process to do that doesn't seem practicle. We did try using only one thread, but it was much too slow of a process.
DDiVita
Sounds extremely IO-bound to me. (Making connections to validate the data?) I remain suspicious that threads will actually gain anything, from that description alone.
Arafangion
In this situation is does. Thanks
DDiVita
+4  A: 

Something simple you could do that would improve perf would be to use a ThreadPool to manage your thread creation. This allows the OS to allocate a group of thread paying the thread create penalty once instead of multiple times.

If you decide to move to .NET 4.0, Tasks would be another way to go.

linuxuser27
Hey, good to see you here :)
Chris Schmich
This is hardly the answer to the question - the question isn't asking about how to manage a thread pool or to optimise the number of threads - it's about the much more basic question of how to use threads in the first place.People learn about thread pools AFTER they learn about locking, and other really basic aspects of multiple threaded designs. (It's a tool, in other words).
Arafangion
Actually I did answer his question. Very precisely. He asked "... how I can make this perform better." Although you are correct regarding that a deeper understand of Threads is important a ThreadPool will make it perform better.
linuxuser27
Arguably true - generally speaking, using a thread-pool will be a better idea, but it's not addressing the primary problems he is showing. I shouldn't have downvoted this answer just because of that, however I can't remove the downvote until the answer is edited!
Arafangion
I agree it is not addressing the primary problem in question. Given his further description I would assume the entire architecture would benefit from a single client thread reading in data and placing it into a queue for other threads to read from and process in parallel. However your reply regarding I\O is something for him to consider.
linuxuser27
+3  A: 

I allow the user to input the number of threads they would like to have running. I also allow the user to allocate the number of records to process per thread.

This isn't something you really want to expose to the user. What are they supposed to put? How can they determine what's best? This is an implementation detail best left to you, or even better, the CLR or another library.

I am not performing any locking (and not sure I need to or not) on the threads.

The majority of issues you'll have with multithreading will come from shared state. Specifically, in your ThreadWorker method, it looks like you refer to the following shared data: _stopThreads, _totalRecords, _noRecordsInPool, _recordsPerThread, _validationId, and _validationDateTime.

Just because these data are shared, however, doesn't mean you'll have issues. It all depends on who reads and writes them. For example, I think _recordsPerThread is only written once initially, and then read by all threads, which is fine. _totalRecords, however, is both read and written by each thread. You can run into threading issues here since _totalRecords += 1; consists of a non-atomic read-then-write. In other words, you could have two threads read the value of _totalRecords (say they both read the value 5), then increment their copy and then write it back. They'll both write back the value 6, which is now incorrect since it should be 7. This is a classic race condition. For this particular case, you could use Interlocked.Increment to atomically update the field.

In general, to do synchronization between threads in C#, you can use the classes in the System.Threading namespace, e.g. Mutex, Semaphore, and probably the most common, Monitor (equivalent to lock) which allows only one thread to execute a specific portion of code at a time. The mechanism you use to synchronize depends entirely on your performance requirements. For example, if you throw a lock around the body of your ThreadWorker, you'll destroy any performance gains you got through multithreading by effectively serializing the work. Safe, but slow :( On the other hand, if you use Interlocked.Increment and judiciously add other synchronization where necessary, you'll maintain your performance and your app will be correct :)

Once you've gotten your worker method to be thread-safe, you should use some other mechanism to manage your threads. ThreadPool was mentioned, and you could also use the Task Parallel Library, which abstracts over the ThreadPool and smartly determines and scales how many threads to use. This way, you take the burden off of the user to determine what magic number of threads they should run.

Chris Schmich