The KB article you refer gives your answers in this way...
To optimize performance at the application layer, Winsock copies data buffers from application send calls to a Winsock kernel buffer. Then, the stack uses its own heuristics (such as Nagle algorithm) to determine when to actually put the packet on the wire.
and, setting TCP_NODELAY or SO_SNDBUF=0 will disable Nagle algorithm as below,
The TCP_NODELAY socket option is applied to disable the Nagle algorithm so that the small data packets are delivered to the remote host without delay.
You can change the amount of Winsock kernel buffer allocated to the socket using the SO_SNDBUF option (it is 8K by default). If necessary, Winsock can buffer significantly more than the SO_SNDBUF buffer size. In most cases, the send completion in the application only indicates the data buffer in an application send call is copied to the Winsock kernel buffer and does not indicate that the data has hit the network medium. The only exception is when you disable the Winsock buffering by setting SO_SNDBUF to 0.
Reading your comment below, I realize you might be confused because setting TCP_NODELAY or setting SO_SNDBUF=0 both seem to be doing the same thing. If that is the case, please note that Nagle is applicable only over TCP streams (which segments data into packets), whereas SO_SNDBUF is a also applicable to UDP sockets.
Setting SO_SNDBUF to zero explicitly stops all output buffering and an immediate dispatch is attempted for each 'write' on the socket (at least in normal socket implementations).
Setting TCP_NODELAY will explicitly stop Nagle algorithm on TCP sockets though the send buffer may be available and used for delayed dispatch (after send success is acknowledged to the application).