You generally need to work in an asynchronous model for these long-lived connections to work. There are several different techniques for doing asynchronous I/O; all of which have their advantages and disadvantages.
One which should already be familiar to anyone who has worked with JavaScript and AJAX is the callback model; in which you send a request, and install a callback to be called when it completes. This is how XMLHTTPRequest
works, without blocking all of the other pages while they wait for one page's request to finish. This is also how the Twisted Python networking framework works, though it can call methods on objects or callback functions depending on the interfaces you use.
Another powerful model is the Erlang style approach, called the Actor model, has many, many lightweight processes (like threads, but with no shared state), each of which communicate with each other via asynchronous messages. The Erlang runtime has been implemented to make spawning thousands of processes very efficient; then you can just have one process for each connection, and have them send messages to other processes implementing the backend of your application. Erlang processes can also be automatically scheduled on multiple OS threads, to take full advantage of multi-core systems. ejabberd, a popular Jabber server (a chat protocol, which requires many long-lived open connections), is implemented in Erlang, as is the Facebook Chat system.
The new Go language from Google uses a similar approach, closer to Hoare's Communicating Sequential than Erlang's Actor model, but which has a lot of similarities.
In Mac OS X 10.6, Apple introduced Grand Central Dispatch, along with blocks (essentially closures) in C, C++, and Objective-C. This allows something like the AJAX or Twisted style event driven callback model, but with explicitly managed queues that are executed sequentially to manage access to shared resources in a multithreaded, multi-core environment. Twisted and JavaScript both run single threaded, and so can only take advantage of a single core, unless you use multiple operating system processes, which can be fairly heavy weight and increase the costs of communication between them.
Then there are the more traditional models, like the Unix select
function, or the more modern and capable epoll
or kqueue()
. In these, you generally have a main loop in your program, which sets up a bunch of events to watch for (network I/O returns some more data, file I/O returns more data, a new network connection is made, etc), and then calls a system call that blocks until one of those events has occurred, at which point you check which one has occurred and then handle it appropriately. These system calls are generally used to provide the higher-level frameworks described above.
For a very good overview of the staggering array of options available (focusing on the more traditional, and lower level, Unix approaches), see The C10K Problem, a survey of different techniques for helping to deal with 10,000 simultaneous connections at once. This also has a good list of C and C++ libraries for abstracting over the various APIs available, such as libevent.
A final option, of course, is to use one process or one OS thread for each connection. The problem is, processes are very heavy weight, and even threads are fairly heavy weight compared to many of these options. In general, for the best performance, you would want to have one process or thread per CPU, each using an asynchronous I/O API to figure out when it needs to do work, and then dispatching that work to one of several objects or callbacks that have been registered to handle connections, or one of several Erlang style lightweight processes that is waiting for a message, or something of the sort.
As a side note, the connection in web sockets are not HTTP connections, but a new protocol, the websocket protocol, though you can use the same port as HTTP, and upgrade an HTTP connection to a web socket in order to be compatible with existing firewall rules.