I've been researching a number of networking libraries and frameworks lately such as libevent, libev, Facebook Tornado, and Concurrence (Python).
One thing I notice in their implementations is the use of application-level per-client read/write buffers (e.g. IOStream in Tornado) -- even HAProxy has such buffers.
In addition to these application-level buffers, there's the OS kernel TCP implementation's buffers per socket.
I can understand the app/lib's use of a read buffer I think: the app/lib reads from the kernel buffer into the app buffer and the app does something with the data (deserializes a message therein for instance).
However, I have confused myself about the need/use of a write buffer. Why not just write to the kernel's send/write buffer? Is it to avoid the overhead of system calls (write)? I suppose the point is to be ready with more data to push into the kernel's write buffer when the kernel notifies the app/lib that the socket is "writable" (e.g. EPOLLOUT). But, why not just do away with the app write buffer and configure the kernel's TCP write buffer to be equally large?
Also, consider a service for which disabling the Nagle algorithm makes sense (e.g a game server). In such a configuration, I suppose I'd want the opposite: no kernel write buffer but an application write buffer, yes? When the app is ready to send a complete message, it writes the app buffer via send() etc. and the kernel passes it through.
Help me to clear up my head about these understandings if you would. Thanks!