views:

479

answers:

3

hi guys, I'm new to threading basics.

I have a queue of operations to be performed on a XML files(node add,node delete etc)

1]There are 'n' xml files and for each file a thread from thread pool is allocated using ThreadPool.QueueUserWorkItem to do those file operations.

I want to achieve both concurrency and ordering of operation(important) using threads.
eg: Suppose if operations [a1,a2,a3,a4,a5] are to be performed on file "A.xml"
and operations [b1,b2,b3,b4,b5,b6,b7] are to be performed on file "B.xml" .....
I want to allocated threads such that i can perform these operations in the
same order and also concurrently(since files are different).

2]Also is it possible to assign each operation a thread and achieve concurency and preserve order.

In STA model i did something similar..

while(queue.count>0){
  File f = queue.Dequeue(); //get File from queue       
  OperationList oprlst = getOperationsForFile(f); 
// will get list-> [a1,a2,a3,a4,a5]   
  for each Operation oprn in oprlst 
  {
    performOperation(f,oprn)
    //in MTA i want to wait till operation "a1" completes and then operation "a2" will
   //start.making threads wait till file is in use or operation a(i) is in use.
  }    
}

i want to do this concurrently with operation order preservation. Threads(of operation) can wait on one file...but Different operations take different execution times.

i tried AutoResetEvent and WaitHandle.WaitAll(..) but it made the while loop stop untill all 'a(i)' operations finish..i want both a(i) and b(j) perform concurrently. (but ordering in a(i) and b(j))

Currently using .net 2.0 .

This is quite similar and is part of this question asked Question

+1  A: 

You should avoid using thread blocking techniques like Monitor locks and WaitHandle structures in ThreadPool threads, since those threads are used by other processes. You need to have your threading be based around individual files. If an individual file doesn't take that long to process (and you don't have too many files), then the ThreadPool will work.

ThreadPool Implementation

You could just use EnqueueUserWorkItem on a file-centric method...something like this:

private void ProcessFile(Object data)
{ 
    File f = (File)data;

    foreach(Operation oprn in getOperationsForFile(f))
    {
        performOperation(f, oprn);
    }
}

Then in your code that processes the files, do this:

while(queue.Count > 0)
{
    ThreadPool.QueueUserWorkItem(new WaitCallback(ProcessFile), queue.Dequeue());
}

If you need your calling thread to block until they are all complete, then a WaitHandle is OK (since you're blocking your own thread, not the ThreadPool thread). You will, however, have to create a small payload class to pass it to the thread:

private class Payload
{
    public File File;
    public AutoResetEvent Handle;
}

private void ProcessFile(Object data)
{ 
    Payload p = (Payload)data;

    foreach(Operation oprn in getOperationsForFile(p.File))
    {
        performOperation(f, oprn);
    }

    p.Handle.Set();
}

...

WaitHandle[] handles = new WaitHandle[queue.Count];
int index = 0;

while(queue.Count > 0)
{        
    handles[index] = new AutoResetEvent();

    Payload p = new Payload();

    p.File = queue.Dequeue();
    p.Handle = handles[index];

    ThreadPool.QueueUserWorkItem(new WaitCallback(ProcessFile), p);

    index++;
}

WaitHandle.WaitAll(handles);

Thread Implementation

If, however, you have a large number of files (or it may take a significant amount of time for your files to process), then creating your own threads is a better idea. This also allows you to get away with omitting the WaitHandles.

private void ProcessFile(File f)
{     
    foreach(Operation oprn in getOperationsForFile(f))
    {
        performOperation(f, oprn);
    }

    p.Handle.Set();
}

private object queueLock = new object();

private void ThreadProc()
{
    bool okToContinue = true;

    while(okToContinue)
    {
        File f = null;

        lock(queueLock)
        {
            if(queue.Count > 0) 
            {
                f = queue.Dequeue();
            }
            else
            {
                f = null;
            }
        }

        if(f != null)
        {
            ProcessFile(f);
        }
        else
        {
            okToContinue = false;
        }
    }
}

...

Thread[] threads = new Thread[20]; // arbitrary number, choose the size that works

for(int i = 0; i < threads.Length; i++)
{
    threads[i] = new Thread(new ThreadStart(ThreadProc));

    thread[i].Start();
}

//if you need to wait for them to complete, then use the following loop:
for(int i = 0; i < threads.Length; i++)
{
    threads[i].Join();
}

The preceding example is a very rudimentary thread pool, but it should illustrate what needs to be done.

Adam Robinson
i thought using Threadpool improves performance than normal threads?will it help in this situation.
Amitd
It eliminates the cost of spinning up your own threads, but depending upon how many files you have to process that fixed cost may not be an issue. There are also reasons NOT to use the thread pool, such as if you have many operations to queue up individually, or if any given operation is particularly long-running.
Adam Robinson
1]is it possible to monitor progress of each thread?2] Also in which order threads aquire lock? Suppose if thread t(i) takes longer time to complete,and if multiple threads say t(i+1),t(i+2)..t(i+n) are waiting on the lock.. is the order preserved?
Amitd
Suppose, i have 'n' threads and m(> n) files to be processed..how can i reuse existing thread for remaining files???
Amitd
1) You would have to implement some kind of event reporting if you wanted to monitor the progress. 2) The ordering of the lock acquisition is not guaranteed (or even really defined), but the lock only exists for the purpose of getting the next file from the queue. The actual file processing takes place outside of the lock, so that would not block other threads.
Adam Robinson
As for the number of threads, if you take this approach then the threads will continue to run until the queue has been exhausted; I was under the impression that your queue was fixed; if this is not the case and you have items being added sporadically, then you'll have to keep a worker thread alive and use another `WaitHandle` to signal it to start processing. That's a more complex approach.
Adam Robinson
yes the queue size is fixed, and also the above approachworks great.but was just curious about the latter case.if possible,please can you elaborate more on the last approach?[i just know thread primer so this all looks complex to me anyways :)]
Amitd
I'll be happy to elaborate, but can you give me some direction as to where you need explanation?
Adam Robinson
thx. i wanted to know the approach when queue size is dynamic and items get added sporadically....how to monitor queue size, and then again invokes the threads when the queue is of certain size.
Amitd
@Amitd: Rather than going into that here, this has sparked some interest in me and I'm in the process of putting together a project and a short article on this. I'll post a link as soon as it's active.
Adam Robinson
@Amitd: Check out http://www.adam-robinson.net/blog/post/ProcessQueue.aspx for what I've thrown together. It's a more fully fleshed-out example of what I was talking about, and it handles items being added to the queue over time.
Adam Robinson
nice. great thx :) great help
Amitd
@Amitd: No problem :) If any of the answers here answered your question, please don't forget to accept it as the answer. It will help others who are looking for similar solutions in the future more easily find what they need. Thanks!
Adam Robinson
@Adam yep did it now :) thx a lot
Amitd
+5  A: 

Either create a new thread for each file, or just use one ThreadPool.QueueUserItem call for each file - either way, you want the operations for the file to be executed sequentially in order, so there's no point in using multiple threads for that part. In your case the only parallelism available is across different files, not operations.

Jon Skeet
i guess that the way it is happening now.i just had the operationshappening in random sequence..that code helped to preserve order.
Amitd
Just running all the operations for a single file in a single thread sequentially will definitely preserve the order. There's no need to start one thread per operation.
Jon Skeet
Suppose, i have 'n' threads and m(> n) files to be processed..how can i reuse existing thread for remaining files???
Amitd
Either use the thread pool and it'll do it automatically, or create a producer/consumer queue - create n consumer threads, add files to the common queue which the threads read from.
Jon Skeet
i tried "Thread Implementation" below..it is similar to "producer- consumer" queue?but i cant resuse a thread once it finishes its work.number of files in queue keep changing but no.of threads are constant.
Amitd
A: 

Threads are for operations that can be done asynchronously and you want your operations done synchronously. The processing of the files seems like they can be done asynchronously (multithreaded) though.

CSharpAtl
yeah it also seems better approach than STA based idea.
Amitd