views:

1958

answers:

15

Hello everyone,

Here is my sample program for web service server side and client side. I met with a strnage performance problem, which is, even if I increase the number of threads to call web services, the performance is not improved. At the same time, the CPU/memory/network consumption from performance panel of task manager is low. I am wondering what is the bottleneck and how to improve it?

(My test experience, double the number of threads will almost double the total response time)

Client side:

class Program
{
    static Service1[] clients = null;
    static Thread[] threads = null;

    static void ThreadJob (object index)
    {
        // query 1000 times
        for (int i = 0; i < 100; i++)
        {
            clients[(int)index].HelloWorld();
        }
    }

    static void Main(string[] args)
    {
        Console.WriteLine("Specify number of threads: ");
        int number = Int32.Parse(Console.ReadLine());

        clients = new Service1[number];
        threads = new Thread[number];

        for (int i = 0; i < number; i++)
        {
            clients [i] = new Service1();
            ParameterizedThreadStart starter = new ParameterizedThreadStart(ThreadJob);
            threads[i] = new Thread(starter);
        }

        DateTime begin = DateTime.Now;

        for (int i = 0; i < number; i++)
        {
            threads[i].Start(i);
        }

        for (int i = 0; i < number; i++)
        {
            threads[i].Join();
        }

        Console.WriteLine("Total elapsed time (s): " + (DateTime.Now - begin).TotalSeconds);

        return;
    }
}

Server side:

    [WebMethod]
    public double HelloWorld()
    {
        return new Random().NextDouble();
    }

thanks in advance, George

+3  A: 

My experience is generally that locking is the problem: I had a massively parallel server once that spent more time context switching than it did performing work.

So - check your memory and process counters in perfmon, if you look at context switches and its high (more than 4000 per second) then you're in trouble.

You can also check your memory stats on the server too - if its spending all its time swapping, or just creating and freeing strings, it'll appear to stall also.

Lastly, check disk I/O, same reason as above.

The resolution is to remove your locks, or hold them for a minimum of time. Our problem was solved by removing the dependence on COM BSTRs and their global lock, you'll find that C# has plenty of similar synchronisation bottlenecks (intended to keep your code working safely). I've seen performance drop when I moved a simple C# app from a single-core to a multi-core box.

If you cannot remove the locks, the best option is not to create as many threads :) Use a thread pool instead to let the CPU finish one job before starting another.

gbjbaanb
A: 

Well, in this case, you're not really balancing your work between the chosen n.º of threads... Each Thread you create will be performing the same Job. So if you create n threads and you have a limited parallel processing capacity, the performance naturally decreases. Another think I notice is that the required Job is a relatively fast operation for 100 iterations and even if you plan on dividing this Job through multiple threads you need to consider that the time spent in context switching, thread creation/deletion will be an important factor in the overall time.

bruno conde
A: 

Thanks gbjbaanb,

Your are guru of this topic. I have three more questions,

  1. I am using Windows Server 2003 and opened perfmon, under both processor and process performance object, there is not a counter called context witch number or something. It is appreciated if you could let me know the exact performance counter name and under which category.

  2. "The resolution is to remove your locks" -- confused. If you looked at my code, I did not add any locks explicitly. :-)

  3. "check your memory and process counters in perfmon" -- is it ok to diagnoise memory/process issue using performance panel of task manager?

regards, George

1. system object -> context switches/sec.2. too bad :) frameworks often put locks in to help you.3. I'd use perfmon, its much more flexible and easier once you've used it. You can save logs and graphs in realtime too and there are lots more counters - eg GC collections.
gbjbaanb
A: 

Thanks bruno conde!

If you think it is the context switch which degrade ther performance, how do you prove or I could prove context switch is the bottleneck?

regards, George

+4  A: 

Although you are creating a multithreaded client, bear in mind that .NET has a configurable bottleneck of 2 simultaneous calls to a single host. This is by design. Note that this is on the client, not the server.

Try adjusting your app.config file in the client:

<system.net>
<connectionManagement>
    <add address=“*” maxconnection=“20″ />
</connectionManagement></system.net>

There is some more info on this in this short article :

markt
A: 

Thanks markt, I have tried but no performance improvement. Any ideas about why? And what is the real bottleneck of performance?

regards, George

A: 

As bruno mentioned, your webmethod is a very quick operation. As an experiment, try ensuring that your HelloWorld method takes a bit longer. Throw in a Thread.Sleep(1000) before you return the random double. This will make it more likely that your service is actually forced to process requests in parallel. Then try your client with different amounts of threads, and see how the performance differs.

markt
A: 

Hi markt,

1.

One questions, my purpose is to increase the # of handled requests from client side, so I want to improve performance -- you suggested to make the web service slower at server side to simulate real scenario (I agree it simulates better tha mine) -- but it never answers my question about how to improve performance;

2.

I have tried to use more threads and even using thread pool, the performance is almost the same. I also noticed that the network is not 100% utilized (only utilized 1%). So, I think there are some bottleneck to performance?

regards, George

A: 

Hi gbjbaanb and markt,

I have tried to re-wrote my code by using thread pool to create less number of threads (to try to prove thread context switch is the bottleneck). But the performance never increases. Here is my code, any ideas?

    static Service1[] clients = null;
    static Thread[] threads = null;
    static ManualResetEvent[] doneEvents = null;

    static void ThreadJob (object index)
    {
        // query 100 times
        for (int i = 0; i < 100; i++)
        {
            clients[(int)index].HelloWorld();
        }

        doneEvents[(int)index].Set();
    }

    static void Main(string[] args)
    {
        Console.WriteLine("Specify number of threads: ");
        int number = Int32.Parse(Console.ReadLine());

        clients = new Service1[number];
        threads = new Thread[number];
        doneEvents = new ManualResetEvent[number];

        for (int i = 0; i < number; i++)
        {
            doneEvents[i] = new ManualResetEvent(false);
            clients [i] = new Service1();
            clients[i].EnableDecompression = true;
            ThreadPool.QueueUserWorkItem(ThreadJob, i);
            // ParameterizedThreadStart starter = new ParameterizedThreadStart(ThreadJob);
            // threads[i] = new Thread(starter);
        }

        DateTime begin = DateTime.Now;

        /*
        for (int i = 0; i < number; i++)
        {
            threads[i].Start(i);
        }

        for (int i = 0; i < number; i++)
        {
            threads[i].Join();
        }
        */

        WaitHandle.WaitAll(doneEvents);

        Console.WriteLine("Total elapsed time (s): " + (DateTime.Now - begin).TotalSeconds);

        Console.ReadLine();

        return;
    }

regards, George

+2  A: 

I don't believe that you are running into a bottleneck at all actually.

Did you try what I suggested ?

Your idea is to add more threads to improve performance, because you are expecting that all of your threads will run perfectly in parallel. This is why you are assuming that doubling the number of threads should not double the total test time.

Your service takes a fraction of a second to return and your threads will not all start working at exactly the same instant in time on the client.

So your threads are not actually working completely in parallel as you have assumed, and the results you are seeing are to be expected.

markt
A: 

Thanks markt,

1.

"Did you try what I suggested ?" -- I have tried if you mean adding sleep at web service server side. Here is my code. But if I add Sleep, the performance is even worse. Previously I have tested using 10 threads each sending 100 requests to server side, the performance is about 13 seconeds, but when using the new code of Sleep, the same amount of requests and threads will use 58 seconds.

Here is my code. Is the code what you mean and the performance drop you expected? :-)

    [WebMethod]
    public double HelloWorld()
    {
        Thread.Sleep(500);
        return new Random().NextDouble();
    }

2.

Another confusion about "I don't believe that you are running into a bottleneck at all actually.". Since in my experience, if CPU/memory/Network is used a little part, there should be much rooms to improve. Any comments or ideas?

regards, George

+1  A: 

Of course adding Sleep will not improve performance.

But the point of the test is to test with a variable number of threads. So, keep the Sleep in your WebMethod.

And try now with 5, 10, 20 threads.

If there are no other problems with your code, then the increase in time should not be linear as before.

You realize that in your test, when you double the amount of threads, you are doubling the amount of work that is being done. So if your threads are not truly executing in parallel, then you will, of course, see a linear increase in total time...

I ran a simple test using your client code (with a sleep on the service). For 5 threads, I saw a total time of about 53 seconds. And for 10 threads, 62 seconds. So, for 2x the number of calls to the webservice, it only took 17% more time.. That is what you are expecting, no ?

markt
A: 

Sorry markt, there is something wrong and I can not comment. Let me reply here this time.

I have tried your solution and using more threads works, but in my experiments, I could improve up to 20% performance by using more threads. But at the same time, CPU/memory/network are used less than 50%, so I think there is still big rooms to do improvements. Any ideas?

BTW: I am not sure whether I am correct that I think if CPU/memory/network is only used a small part, there is definitely room to improve performance?

regards, George

George2
A: 

Try to use some processor consuming task instead of Thread.Sleep. Actually combined approach is the best.

Sleep will just pass thread's time frame to another thread.

+1  A: 

You are not seeing any performance gain because there is none to be had. The one line of code in your service (below) probably executes without a context switch most of the time anyway.

return new Random().NextDouble();

The overhead involved in the web service call is higher than than the work you are doing inside of it. If you have some substantial work to do inside the service (database calls, look-ups, file access etc) you may begin to see some performance increase. Just parallelizing a task will not automatically make it faster.

-Jason

Jason Hernandez