views:

867

answers:

1

So I am doing a lot of high performance network programming using Boost::Asio (or just Asio if you will), and have a pretty solid grasp of the essentials of both TCP and UDP protocols. I am wondering though, because I still don't consider myself an expert in networking despite my knowledge, what is a good way to frame the essentials of what networking programmers should know, especially for those trying to push the performance of their large networking based applications?

There is a great essay on programmers and what they should know about memory (see below), so I'm wondering if someone has put together something similar for networking.

What every programmer should know about memory

+7  A: 

Some bullet points off the top of my head of things you should know:

  • How and why TCP works... 3-way handshakes, acknowledgement, delayed ack, nagling, sliding window protocol. There's a concrete reason for every one of those features... and they can all destroy your application's performance if handled improperly.
  • UDP multicast... even if you never think you'll use it, you need to know why it exists so you can make educated decisions when designing systems.
  • IP fragmentation, and the impact of MTU.
  • Binary serialization and network byte ordering (even if you're just going to use Google proto buffers, it's nice to understand why they are efficient).
  • Ascii serialization and message framing (what does \r\n\r\n mean in HTTP?)
  • Different I/O dispatch models: Apache-style preforking, thread-per-connection, event-based single-threaded, event-based with worker threads, etc.
  • The impact of buffer-overflow vulnerabilities in a networked app
  • Protocol-based design, as opposed to API- or library-based design
  • asynchronous vs synchronous protocols. Many high-performance systems are asynchronous. HTTP is synchronous unless you use pipelining, and even then, there are many restrictions on what is possible... no out-of-order responses, for example.


Update: What does protocol-based design mean?

Consider HTTP, the protocol of the web. Apache, IIS, Lighttpd, Firefox, Opera, WebKit, etc... All of these pieces of software speak HTTP. It's quite possible that none of them are sharing the code to do so. The downside, of course, is the increased likelihood of bugs due to the net volume of code. There are numerous upsides:

  • Any program can communicate via HTTP, regardless of implementation language
  • Lightweight/embedded environments can pick and choose a subset of the protocol, rather than using the whole thing
  • It's possible to optimize a protocol handler for particular situations. It's not possible to optimize a library without sacrificing generality.
  • A variety of different implementations forces library providers to address bugs (rather than just blowing them off because, well, everyone uses the same library).
  • There is no organizational or contractual burden on users of HTTP, no licensing fees.

When you design a network protocol, you can build yourself several APIs, each tailored towards specific use-cases. Or you can build one, it's up to you. Networked software components can be upgraded independent of each other. Basically, everything you hear that's good about Java/C# Interfaces and C++ abstract classes, but applied at the network layer rather than the programming language layer.

Tom
Thanks, great list. Can you expand on this point though? "Protocol-based design, as opposed to API- or library-based design"
ApplePieIsGood
Ah got you, that makes total sense. Thanks for the clarification. Any recommendation on where to read up on these things? Besides the multi-volume TCP book series, anything a little more condensed and geared towards devs?
ApplePieIsGood
Sorry, I don't really know of any reference material for this... it's all nuggets I've picked up on the job.
Tom
Reginaldo