views:

125

answers:

4

I am using the following 2 methods. Method called DoMyWork1 does scale well like it takes 6 seconds to run three of them in 3 threads. Whereas DoMyJob method does not scale at all. If one thread takes 4 seconds then it takes 13 seconds to run 3 threads. What am I doing wrong? Does file read and/or write needs special thread handling other than thread pool?

My calling code

public static void Process(MyDelegate md , int threads)
{
    int threadcount = threads;

    ManualResetEvent[] doneEvents = new ManualResetEvent[threadcount];

    DateTime dtstart = DateTime.Now;

    List<string> myfiles = GetMyFiles(@"c:\");


    for (int i = 0; i < threadcount; i++)
    {

        doneEvents[i] = new ManualResetEvent(false);
        MyState ms = new MyState();
        ms.ThreadIndex = i;
        ms.EventDone = doneEvents[i];
        ms.files = myfiles;
        ThreadPool.QueueUserWorkItem(md.Invoke, ms);
    }


    WaitHandle.WaitAll(doneEvents);

    DateTime dtend = DateTime.Now;
    TimeSpan ts = dtend - dtstart;
    Console.WriteLine("All complete in {0} seconds.", ts.ToString());
    Console.ReadLine();

}

public static void DoMyWork1(Object threadContext)
{
    MyState st = (MyState)threadContext;
    Console.WriteLine("thread {0} started...", st.ThreadIndex);

    Thread.Sleep(5000);

    Console.WriteLine("thread {0} finished...", st.ThreadIndex);
    st.EventDone.Set();
}



private static void DoMyJob(MyState st)
{
    Console.WriteLine("I am in thread {0} started...", st.ThreadIndex);


    string[] mystrings = new string[] { "one", "two", "three" };

    foreach (string s in mystrings)
    {
        foreach (string file in st.files)
        {
            if (!(new StreamReader(file).ReadToEnd().Contains(s)))
            {
                AppendToFile(String.Format("{0} word searching in file {1} in thread {2}", s, file, st.ThreadIndex));
            }


        }
    }

    Console.WriteLine("I am in thread {0} ended...", st.ThreadIndex);
}
A: 

All file access will become serial in the OS layer and threading it as such is going to result in exactly what you see.

Jesse C. Slicer
is there any way to multi thread file processing??
dotnet-practitioner
I don't believe that is correct (that file access is ever serial), although I don't know exactly what you mean by 'serial' in this context. Certainly you can have two threads both doing file IO at the same time. Obviously the head on the disc can only be over one part of the platter at a time, but the hardware and OS generally do a pretty good job of keeping that from being a problem.
Bruce
http://stackoverflow.com/questions/93834/when-is-multi-threading-not-a-good-ideahttp://objectmix.com/smalltalk/761155-multi-threaded-file-access.htmlThese were my sources.,y spi
Jesse C. Slicer
A: 

I'm a little suprised - I'd expect the first access to these files to cache, and then remaining accesses just hit memory. so three threads shouldn't be too much slower than one. If you're writing to each file, that would make a difference - what exactly does the AppendToFile function do?

Bruce
A: 

One problem could be that you are opening and reading each file, for each new string you are looking for.

What would happen if you switched the order of your foreach loops and only appended to the file as needed?

I think you would see much better performance.

Ideally if you can take the file reading out of the loop altogether, that would be the fastest. I/O bound operations will always cause context switches waiting on the disk to return the data.

GrayWizardx
+2  A: 

Threads can improve program perf only if the program is starved for CPU resources. That's not the case for your program, it should be readily visible from the Taskmgr.exe Performance tab. The slow resource here is your hard disk, or the network card. The ReadToEnd() call is glacially slow, waiting for the disk to retrieve the file data. Anything else you do with the file data is easily 3 orders of magnitude faster than that.

The threads will just wait in turn for the disk data. In fact, there's a good chance that the threads actually make your program run a lot slower. They will cause the disk drive head to jump back-and-forth between disjoints tracks on the disk since each thread is working with a different file. The one thing that is really slow is causing the head to seek to another track. Typically around 10 msec for a fast disk. Equivalent to about half a million CPU instructions.

You can't make your program run faster unless you get a faster disk. SSDs are nice. Beware of effects of the file system cache, the second time you run your program it will run very fast when the file data is retrieved from the cache instead of the disk. This will rarely happen in a production environment.

Hans Passant