views:

238

answers:

4

In a partially distributed network app I'm working on in C++ on Linux, I have a message-passing abstraction which will send a buffer over the network. The buffer is sent in two steps: first a 4-byte integer containing the size is sent, and then the buffer is sent afterwards. The receiving end then receives in 2 steps as well - one call to read() to get the size, and then a second call to read in the payload. So, this involves 2 system calls to read() and 2 system calls to write().

On the localhost, I setup two test processes. Both processes send and receive messages to each other continuously in a loop. The size of each message was only about 10 bytes. For some reason, the test performed incredibly slow - about 10 messages sent/received per second. And this was on localhost, not even over a network.

If I change the code so that there is only 1 system call to write, i.e. the sending process packs the size at the head of the buffer and then only makes 1 call to write, the whole thing speeds up dramatically - about 10000 messages sent/received per second. This is an incredible difference in speed for only one less system call to write.

Is there some explanation for this?

+1  A: 

This is essentially the same question: C# socket abnormal latency .

In short, you'll want to use the TCP_NODELAY socket option. You can set it with setsockopt.

Ville Laurikari
A: 

You might be seeing the effects of the Nagle algorithm, though I'm not sure it is turned on for loopback interfaces.

If you can combine your two writes into a single one, you should always do that. No sense taking the overhead of multiple system calls if you can avoid it.

Steve Madsen
A: 

You don't give enough information to say for sure. You don't even say which protocol you're using.

Assuming TCP/IP, the socket could be configured to send a packet on every write, instead of buffering output in the kernel until the buffer is full or the socket is explicitly flushed. This means that TCP sends the two pieces of data in different fragments and has to defeagment them at the other end.

You might also be seeing the effect of the TCP slow-start algorithm. The first data sent is transmitted as part of the connection handshake. Then the TCP window size is slowly ramped up as more data is transmitted until it matches the rate at which the receiver can consume data. This is useful in long-lived connections but a big performance hit in short-lived ones. You can turn off slow-start by setting a socket option.

Have a look at the TCP_NODELAY and TCP_NOPUSH socket options.

An optimization you can use to avoid multiple system calls and fragmentation is scatter/gather I/O. Using the sendv or writev system call you can send the 4-byte size and variable sized buffer in a single syscall and both pieces of data will be sent in the same fragment by TCP.

Nat
A: 

Okay, well I'm using TCP/IP (SOCK_STREAM) sockets. The example code is pretty straight forward. Here is a basic snippet that reproduces the problem. This doesn't include all the boiler plate setup code, error-checking, or ntohs code:

On the sending end:

// Send size
uint32_t size = strlen(buffer);
int res = write(sock, &size, sizeof(size));

// Send payload
res = write(sock, buffer, size);

And on the receiving end:

// Receive size
uint32_t size;
int res = read(sock, &size, sizeof(size));

// Receive payload
char* buffer = (char*) malloc(size);
read(sock, buffer, size);

Essentially, if I change the sending code by packing the size into the send buffer, and only making one call to write(), the performance increase is almost 1000x faster.