views:

1385

answers:

7

Hello,

Over the last couple of months I've been working on some implementations of sockets servers in C++ and Java. I wrote a small server in Java that would handle & process input from a flash application hosted on a website and I managed to successfully write a server that handles input from a 2D game client with multiple players in C++. I used TCP in one project & UDP in the other one. Now, I do have some questions that I couldn't really find on the net and I hope that some of the experts could help me. :)

Let's say I would like to build a server in C++ that would handle the input from thousands of standalone and/or web applications, how should I design my server then? So far, I usually create a new & unique thread for each user that connects, but I doubt this is the way to go.

Also, How does one determine the layout of packets sent over the network; is data usually sent over the network in a binary or text state? How do you handle serializated objects when you send data to different media (eg C++ server to flash application)?

And last, is there any easy to use library which is commonly used that supports portability (eg development on a windows machine & deployment on a linux box) other than boost asio.

Thank you.

+5  A: 

Sounds like you have a couple of questions here. I'll do my best to answer what I can see.

1. How should I handle threading in my network server?

I would take a good look at what kind of work you're doing on the worker threads that are being spawned by your server. Spawning a new thread for each request isn't a good idea...but it might not hurt anything if the number of parallel requests is small and and tasks performed on each thread are fast running.

If you really want to do things the right way, you could have a configurable/dynamic thread pool that would recycle the worker threads as they became free. That way you could set a max thread pool size. Your server would then work up to the pool size...and then make further requests wait until a worker thread was available.

2. How do I format the data in my packets?

Unless you're developing an entirely new protocol...this isn't something you really need to worry about. Unless you're dealing with streaming media (or another application where packet loss/corruption is acceptable), you probably won't be using UDP for this application. TCP/IP is probably going to be your best bet...and that will dictate the packet design for you.

3. Which format do I use for serialization?

The way you serialize your data over the wire depends on what kind of applications are going to be consuming your service. Binary serialization is usually faster and results in a smaller amount of data that needs to be transfered over the network. The downside to using binary serialization is that the binary serialization in one language may not work in another. Therefore the clients connecting to your server are, most likely, going to have to be written in the same language you are using.

XML Serialization is another option. It will take longer and have a larger amount of data to be transmitted over the network. The upside to using something like XML serialization is that you won't be limited to the types of clients that can connect to your server and consume your service.

You have to choose what fits your needs the best.

...play around with the different options and figure out what works best for you. Hopefully you'll find something that can perform faster and more reliably than anything I've mentioned here.

Justin Niessner
I like your answer, very much. Just a quick clarification: is there a serialization scheme that would be compatible in more than one language? For example, Java Native serialization is compatible with .NET's?
Pablo Santa Cruz
If you're talking about binary serialaztion formats...there are none that I know of. Text based serialization (like XML, JSON, etc.) are the only ones since they leave the low level representation up to the language (List<T> in C# might wind up being T[] in Java)
Justin Niessner
I realized this morning that I forgot to mention COM+ and DCOM. These technologies provided a binary compatibility layer on top of your components. You had to use the COM compatible types...but it was a binary serialization that crossed langauges.
Justin Niessner
IMHO, I believe that Protocol Buffers fits the bill for binary serialization too.
Camilo Díaz
+1  A: 

you're still going to need a socket to handle every client, but the idea would be to create a pool of X sockets (say 50) and then, when you get close (say 90%) to consuming all those sockets, create another pool of X sockets. At some point, after clients have connected, sent data and disconnected, some of your sockets will be available for use and you can use them (google socket pools for this info)

The layout of data is always difficult. If all your clients and servers will be using the same hardware and operating system, you can send data in binary format, but there are many trips and traps there (byte alignment is at the top of the list). sending formatted text is always easier, but certainly more expensive in terms of bandwidth and processing power because you have to change format from machine to text before sending and, of course, back again at the receiver.

re: serialized, I'm sorry, I can't help you, nor with libraries (I'm too embedded to have used much of these)

KevinDTimm
Seems that you are confusing sockets and threads here.
Nikolai N Fetissov
yeah, I mixed my terms at the beginning (fixed now)
KevinDTimm
+3  A: 

As far as server design concern, I would say that you are right: although ONE-THREAD-PER-SOCKET is a simple and easy approach, it is not the way to go since it won't scale as well as other server design patterns.

I personally like the COMMUNICATION-THREADS/WORKER-THREADS approach, where a pool of a dynamic number of worker threads handle all the work generated by producer threads.

In this model, you will have a number of threads in a pool waiting for tasks that are going to be generated from another set of threads handling network I/O.

I found UNIX Network Programming by Richard Stevens and amazing source for this kind on network programming approaches. And, despite its name, it will be very useful in windows environments as well.

Regarding the layout of the packets (you should have post a different question for this since it is a totally different question, in my opinion), there are tradeoffs when selecting TEXT vs BINARY approach.

TEXT (i.e. XML) is probably easier to parse and document, and more simple in general, while a BINARY protocol should give you better performance in terms of speed of processing and size of network packets, but you will have to deal with more complicated issues such as ENDIANNES of the words and stuff like that.

Hope it helps.

Pablo Santa Cruz
+2  A: 

Though previous answers provide good direction, just for completeness, I'd like to point out that threads are not an absolute requirement for great socket server performance. Some examples are here. There are many approaches to scalability too - thread pools, pre-forked processes, server pools, etc.

Nikolai N Fetissov
+1  A: 

Thank you for all the replies, this was exactly what I needed!

Chaoz
+1  A: 

1) And last, is there any easy to use library which is commonly used that supports portability (eg development on a windows machine & deployment on a linux box) other than boost asio.

The ACE library is another alternative. It's very mature (been around since the early 90s) and widely deployed. A brief discussion about how it compares to Boost ASIO is available on the Riverace website here. Keep in mind that ACE has had to support a large number of legacy platforms for long time so it doesn't utilize modern C++ features as much as Boost ASIO, for example.

2) Let's say I would like to build a server in C++ that would handle the input from thousands of standalone and/or web applications, how should I design my server then? So far, I usually create a new & unique thread for each user that connects, but I doubt this is the way to go.

There are a number of commonly used approaches including but not limited to: thread-per-connection (the approach you describe) and thread pool (the approach Justin described). Each have their pros and cons. Many have a looked at the trade-offs. A good starting point might be the links on the Thread Pool Pattern Wikipedia page.

Dan Kegel's "The C10K Problem" web page has lots of useful notes about improving scalability as well.

3) Also, How does one determine the layout of packets sent over the network; is data usually sent over the network in a binary or text state? How do you handle serializated objects when you send data to different media (eg C++ server to flash application)?

I agree with others that sending binary data is generally going to be most efficient. The boost serialization library can be used to marshal data into a binary form (as well as text). Mature binary formats include XDR and CDR. CDR is the format used by CORBA, for instance. The company ZeroC defines the ICE encoding, which is supposed to be much more efficient than CDR.

There are lots of binary formats to choose from. My suggestion would be to avoid reinventing the wheel by at least reading about some of these binary formats so that you don't end up running into the same pitfalls these existing binary formats were designed to address.

That said, lots of middleware exists that already provides a canned solution for most of your needs. For example, OpenSplice and OpenDDS are both implementations of the OMG Data Distribution Service standard. DDS focuses on efficient distribution of data such as through a publish-subscribe model, rather than remote invocation of functions. I'm more familiar with the OMG defined technologies but I'm sure there are other middleware implementations that will fit your needs.

Void
A: 

Dear colleagues. About server sockets and serialization(marshaling). The most important problem is growing sockets number is readable and writable state in select. I am not about limitation in the FD_SET. This is solvable simply. I am about growth of time of signaling and problem data accumulation in not read sockets while processing data available in evaluated socket. So the solution may be even out of SW boundaries and require multiple processor model,when roles of processors are limited: one reads and writes, N are processing. In this case all available socket data should has been read when select returned and sent to another processing units. The same is about incoming data. About marshaling. Of coarse a binary format is preferable because performance.By the way XML in the terms of UNICODE has the same problem. But,... comrades, it is not simply copying long or integer value into a socket stream. But in this case even htons, htonl could help (it sends/receives in NW format and OS is responsible for data convert). But it is safe more sending data following representation header, where exposed format of most/least significant bits placed, bytes order and IEEE data type. This works, I had not a case when not. Kind regards and great success for everyone. Simon Cantor

Simon