views:

290

answers:

5

We are implementing a C# application that needs to make large numbers of socket connections to legacy systems. We will (likely) be using a 3rd party component to do the heavy lifting around terminal emulation and data scraping. We have the core functionality working today, now we need to scale it up.

During peak times this may be thousands of concurrent connections - aka threads (and even tens of thousands several times a year) that need to be opened. These connections mainly sit idle (no traffic other than a periodic handshake) for minutes (or hours) until the legacy system 'fires an event' we care about, we then scrape some data from this event, perform some workflow, and then wait for the next event. There is no value in pooling (as far as we can tell) since threads will rarely need to be reused.

We are looking for any good patterns or tools that will help use this many threads efficiently. Running on high-end server hardware is not an issue, but we do need to limit the application to just a few servers, if possible.

In our testing, creating a new thread, and init'ing the 3rd party control seems to use a lot of CPU initially, but then drops to near zero. Memory use seems to be about 800Megs / 1000 threads

Is there anything better / more efficient than just creating and starting the number of threads needed?

PS - Yes we know it is bad to create this many threads, but since we have not control over the legacy applications, this seems to be our only alternative. There is not option for multiple events to come across a single socket / connection.

Thanks for any help or pointers! Vans

+2  A: 

You say this:

There is no value in pooling (as far as we can tell) since threads will rarely need to be reused.

But then you say this:

Is there anything better / more efficient than just creating and starting the number of threads needed?

Why the discrepancy? Do you care about the number of threads you are creating or not? Thread pooling is the proper way to handle large numbers of mostly-idle connections. A few busy threads can handle many idle connections easily and with fewer resources required.

1800 INFORMATION
+1, sounds like thread pooling is exactly what the poster is looking for.
Kevin Nisbet
Actually this is not really what thread pooling is for. Without some ninja juggling of sockets between a main thread waiting on some wait handle and dispatching to the pool you would quickly exhaust your actual running threads with idle connections and the rest would be queued to be serviced by threads if one ever happens to wake.
joshperry
How is this a discrepancy? I said we don't think there is value in a pool for us, and asked for a pattern for managing threads. A pool may be one method, but we are looking for others. Maybe the answer is that we just need to open that many threads and deal with it, but I thought we should look around first.Once we release a thread back to the pool, the connection is closed, right? This does not help us, we need a persistent connection
Vans
The pool of threads is typically managed separately from the pool of connections - you can keep the connections open for as long as you like, then as and when required, hand the connection off to one of the pooled threads in order to handle the activity.
1800 INFORMATION
A: 

I would really look into MPI.NET. More Info MPI. MPI.NET also has some Parallel Reduction; so this will work well to aggregate results.

choudeshell
Thanks, but I don't think this helps us. We have no control of the legacy systems, and can't implement new protocols there.
Vans
+1  A: 

Use the socket's asynchronous BeginReceive and BeginSend. These dispatch the IO operation to the operating system and return immediately.

You pass a delegate and some state to those methods that will be called when an IO operation completes.

Generally once you are done processing the IO then you immediately call BeginX again.

Socket sock = GetSocket();
State state = new State() { Socket = sock, Buffer = new byte[1024], ThirdPartyControl = GetControl() };

sock.BeginReceive(state.Buffer, 0, state.Buffer.Length, 0, ProcessAsyncReceive, state);

void ProcessAsyncReceive(IAsyncResult iar)
{
    State state = iar.AsyncState as State;

    state.Socket.EndReceive(iar);

    // Process the received data in state.Buffer here
    state.ThirdPartyControl.ScrapeScreen(state.Buffer);

    state.Socket.BeginReceive(state.buffer, 0, state.Buffer.Length, 0, ProcessAsyncReceive, iar.AsyncState);
}

public class State
{
    public Socket Socket { get; set; }
    public byte[] Buffer { get; set; }
    public ThirdPartyControl { get; set; }
}

BeginSend is used in a similar fashion, as well as BeginAccept if you are accepting incoming connections.

With low throughput operations Async communications can easily handle thousands of clients simultaneously.

joshperry
I'll second that. It appears that the OP and some commenters misunderstand the relationship between threads and IO. When using the Async methods, callbacks occur on IOCP threads from the OS. These callbacks would subsequently be dealt with from threads from the threadpool, used fleetingly to collect the data then released back to the pool. If the callback handler is sufficiently light, very few threads can service a huge number of connectections. +1
spender
Thanks. We will look into this. The only concern is our 3rd party control - I'm not sure how well it will do with multiple connections on a single thread.
Vans
This may help us. We'll have to write our own code to handle the line control and data scraping.
Vans
A: 

I would suggest utilizing the Socket.Select() method, and pooling the handling of multiple socket connections within a single thread.

You could, for example, create a thread for every 50 connections to the legacy system. These master threads would just keep calling Socket.Select() waiting for data to arrive. Each of these master threads could then have a thread pool that sockets that have data are passed to for actual processing. Once the processing is complete, the thread could be passed back to the master thread.

Steve Wranovsky
A: 

The are a number of patterns using Microsoft's Coordination and Concurrency Runtime that make dealing with IO easy and light. It allows us to grab and process well over 6000 web pages a minute (could go much higher, but there's no need) in a crawler we are developing. Definitely worth a the time investment required to shift your head into the CCR way of doing things. There's a great article here:

http://msdn.microsoft.com/en-us/magazine/cc163556.aspx

spender
Anyone else having trouble with the link?
spender