views:

975

answers:

7

I have a networking Linux application which receives RTP streams from multiple destinations, does very simple packet modification and then forwards the streams to the final destination.

How do I decide how many threads I should have to process the data? I suppose, I cannot open a thread for each RTP stream as there could be thousands. Should I take into account the number of CPU cores? What else matters? Thanks.

+2  A: 

I would look into a thread pool for this application.

http://threadpool.sourceforge.net/

Allow the thread pool to manage your threads and the queue.

You can tweak the maximum and minimum number of threads used based on performance profiling later.

Geoffrey Chetwood
A: 

It is a good idea to avoid trying to create one (or even N) threads per client request. This approach is classically non-scalable and you will definitely run into problems with memory usage or context switching. You should look at using a thread pool approach instead and look at the incoming requests as tasks for any thread in the pool to handle. The scalability of this approach is then limited by the ideal number of threads in the pool - usually this is related to the number of CPU cores. You want to try to have each thread use exactly 100% of the CPU on a single core - so in the ideal case you would have 1 thread per core, this will reduce context switching to zero. Depending on the nature of the tasks, this might not be possible, maybe the threads have to wait for external data, or read from disk or whatever so you may find that the number of threads is increased by some scaling factor.

1800 INFORMATION
+4  A: 

Classically the number of reasonable threads is depending on the number of execution units, the ratio of IO to computation and the available memory.

Number of Execution Units (XU)

That counts how many threads can be active at the same time. Depending on your computations that might or might not count stuff like hyperthreads -- mixed instruction workloads work better.

Ratio of IO to Computation (%IO)

If the threads never wait for IO but always compute (%IO = 0), using more threads than XUs only increase the overhead of memory pressure and context switching. If the threads always wait for IO and never compute (%IO = 1) then using a variant of poll() or select() might be a good idea.

For all other situations XU / %IO gives an approximation of how many threads are needed to fully use the available XUs.

Available Memory (Mem)

This is more of a upper limit. Each thread uses a certain amount of system resources (MemUse). Mem / MemUse gives you an approximation of how many threads can be supported by the system.

Other Factors

The performance of the whole system can still be constrained by other factors even if you can guess or (better) measure the numbers above. For example, there might be another service running on the system, which uses some of the XUs and memory. Another problem is general available IO bandwidth (IOCap). If you need less computing resources per transferred byte than your XUs provide, obviously you'll need to care less about using them completely and more about increasing IO throughput.

For more about this latter problem, see this Google Talk about the Roofline Model.

David Schmitt
+14  A: 

It is important to understand the purpose of using multiple threads on a server; many threads in a server serve to decrease latency rather than to increase speed. You don't make the cpu more faster by having more threads but you make it more likely a thread will always appear at within a given period to handle a request.

Having a bunch of threads which just move data in parallel is a rather inefficient shot-gun (Creating one thread per request naturally just fails completely). Using the thread pool pattern can be a more effective, focused approach to decreasing latency.

Now, in the thread pool, you want to have at least as many threads as you have CPUs/cores. You can have more than this but the extra threads will again only decrease latency and not increase speed.

Think the problem of organizing server threads as akin to organizing a line in a super market. Would you like to have a lot of cashiers who work more slowly or one cashier who works super fast? The problem with the fast cashier isn't speed but rather that one customer with a lot of groceries might still take up a lot of their time. The need for many threads comes from the possibility that a few request that will take a lot of time and block all your threads. By this reasoning, whether you benefit from many slower cashiers depends on whether your have the same number of groceries or wildly different numbers. Getting back to the basic model, what this means is that you have to play with your thread number to figure what is optimal given the particular characteristics of your traffic, looking at the time taken to process each request.

Joe Soul-bringer
+ ditto. Awesome.
Adam Davis
+4  A: 

I'd say, try using just ONE thread; it makes programming much easier. Although you'll need to use something like libevent to multiplex the connections, you won't have any unexpected synchronisation issues.

Once you've got a working single-threaded implementation, you can do performance testing and make a decision on whether a multi-threaded one is necessary.

Even if a multithreaded implementation is necessary, it may be easier to break it into several processes instead of threads (i.e. not sharing address space; either fork() or exec multiple copies of the process from a parent) if they don't have a lot of shared data.

You could also consider using something like Python's "Twisted" to make implementation easier (this is what it's designed for).

Really there's probably not a good case for using threads over processes - but maybe there is in your case, it's difficult to say. It depends how much data you need to share between threads.

MarkR
+2  A: 

Listen to the people advising you to use libevent (or OS specific utilities such as epoll/kqueue). In the case of many connections this is an absolute must because, like you said, creating threads will be an enormous perfomance hit, and select() also doesn't quite cut it.

A: 

Let your program decide. Add code to it that measures throughput and increases/decreases the number of threads dynamically to maximize it.

This way, your application will always perform well, regardless of the number of execution cores and other factors

dmityugov
Why wouldn't you just use a thread pool then?
Geoffrey Chetwood
Does thread pool measure throughput in any way?
dmityugov
A given thread pool implementation could potentially self-optimize (GNU Make does this), but that's certainly not the status quo.
Tom