views:

60

answers:

1

I have a function where I will compress a bunch of files into a single compressed file..it is taking a long time(to compress),so I tried implementing threading in my application..Say if I have 20 files for compression,I separated that as 5*4=20,inorder to do that I have separate variables(which are used for compression) for all 4 threads in order to avoid locks and I will wait until the 4 thread finishes..Now..the threads are working but i see no improvement in their performance..normally it will take 1 min for 20 files(for example) after implementing threading ...there is only 5 or 3 sec difference., sometimes the same. here i will show the code for 1 thread(so it is for other3 threads)

//main thread   
    myClassObject->thread1 = AfxBeginThread((AFX_THREADPROC)MyThreadFunction1,myClassObject);
    ....

    HANDLE threadHandles[4];
    threadHandles[0] = myClassObject->thread1->m_hThread;
    ....

    WaitForSingleObject(myClassObject->thread1->m_hThread,INFINITE);

UINT MyThreadFunction(LPARAM lparam)
{

    CMerger* myClassObject = (CMerger*)lparam;
    CString outputPath = myClassObject->compressedFilePath.GetAt(0);//contains the o/p path
    wchar_t* compressInputData[] = {myClassObject->thread1outPath,
                    COMPRESS,(wchar_t*)(LPCTSTR)(outputPath)};
    HINSTANCE loadmyDll;
    loadmydll = LoadLibrary(myClassObject->thread1outPath);
    fp_Decompress callCompressAction = NULL;
    int getCompressResult=0;
    myClassObject->MyCompressFunction(compressInputData,loadClient7zdll,callCompressAction,myClassObject->thread1outPath,
                    getCompressResult,minIndex,myClassObject->firstThread,myClassObject);
    return 0;
}
+3  A: 

Firstly, you only wait on one of the threads. I think you want WaitForMultipleObjects.

As for the lack of speed up have you considered that your actual bottleneck is NOT the compression but the file loading? File loading is slow and 4 threads contending for time slices of the hard disk "could" even result in lower performance.

This is why premature optimisation is evil. You need to profile, profile and profile again to work out where your REAL bottlenecks are.

Edit: I can't really comment on your WaitForMultipleObjects unless I see the code. I have never had any problems with it myself ...

As for a bottleneck. Its a metaphor if you try to pour a large amount of liquid out of a cylinder by tipping it upside-down then the water leaves at a constant rate. If you try to do this with a bottle you will notice that it can't do it as fast. This is because there is only so much liquid that can flow through the thin part of the bottle (not to mention the air entering into it). Thus the limitation of your water emptying from the container is limited by the neck of the bottle (the thin part).

In programming when you talk about a bottle neck you are talking about the slowest part of the code. In this case if your threads spend most of their time waiting for the disk load to complete then you are going to get very little speed up by multi-threading as you can only load so much at once. In fact when you try to load 4 times as much at once then you will start to find that you have to wait around just as long for the load to complete. In your single threading case you wait around and once its loaded you compress. In the 4 threaded case you are waiting around 4 times as long for all the loads to complete and then you compress all 4 files simultaneously. This is why you get a small speed up. Unfortunately due to the fact you are spending most of your time waiting for the loads to complete you won't see anything approaching a 4x speed up. Hence the limiting factor of your method is not the compression but the loading the file from disk and hence it gets called a bottleneck.

Edit2: In a case such as you are suggesting you will find the best speed up would be had by eliminating the amount of time you are waiting for data to load from disk.

1) If you load a file as multiple of disk pages (usually 2048 byte but you can query windows to get the size) you get best possible load performance. If you load sizes that aren't a multiple of this you will get quite a serious performance hit.

2) Look at asynchronous loading. For example you could be loading all of file 2 (or more) in to memory while you are processing file 1. This means that you aren't waiting around for the load to complete. Its unlikely, though, you'll get a vast speed up here as you'll probably still end up waiting for the load. The other thing to try is to load "chunks" of the audio file asynchronously. ie:

  • Load chunk 1.
  • Start chunk 2 loading.
  • Process chunk 1.
  • Wait for chunk 2 to load.
  • Start chunk 3 loading.
  • Process chunk 2.
  • (And so on)

3) You could just buy a faster disk drive.

Goz
Yes, it looks more like the I/O is the bottleneck.
shinkou
Its again u Goz,hope you will show me a way....Goz the thing is I dont know much in deep about the compression code(its another dll,but its thread safe).I am waiting for all the 4 threads..I showed only one in example.I tried using WaitfotmultopleObjects but it is not waiting for the last thread to finish..its like the 4th thread runs after the wait n thats why I used wait for single
kiddo
and also am not sure why you guys are using the word "bottleneck"..what does it mean to my case..please explain
kiddo
so,in any kind of case with file processing...how can we improve the peformance..will there not be multithreading?or anyother method
kiddo
One little update for ya :)
Goz