tags:

views:

51

answers:

1

Hello, we are currently performing some benchmarks for an open source academic project, Logbus-ng. It basically implements Syslog protocol over UDP (RFC 5426) and TLS (RFC 5425). We know that the advantage of TLS is reliability (ie. we won't lose messages) but with the drawback of performance.

We have a benchmarking client and also a special forged Apache installation that send messages at high rates. Our goal is reduce the loss of UDP packets to the minimum. Apache 1.3.41 has been instrumented in order to send special log messages via UDP (not in Syslog format but in a special short syntax we parse on the server-side), and such instrumentation makes it send over 2000 messages when httpd start, and we want it to happen :)

More, I can tell you that during the ramp-up phase of Apache this small amount of messages (compared to other workloads we submitted to the log server) is sent at an extremely high rate, possibly flooding UDP.

Now, the log server is located on a different machine than the HTTP server, and both have barely decent hardware (not even a dual core CPU but a Pentium 4 with HyperThread). The log server code is in C#. The following method is run by 4 threads in AboveNormal priority

    UdpClient _client;
    IQueue<T>[] _byteQueues; //not really IQueue, but a special FIFO queue class that reduces overhead to the minimum

    private void ListenerLoop()
    {
        IPEndPoint remoteEndpoint = new IPEndPoint(IPAddress.Any, 0);
        while (_listen)
        {
            try
            {
                byte[] payload = _client.Receive(ref remoteEndpoint);

                _byteQueues[
                    (((Interlocked.Increment(ref _currentQueue))%WORKER_THREADS) + WORKER_THREADS)%WORKER_THREADS].
                    Enqueue(payload);
            }
            catch (SocketException)
            {
            }
            catch (Exception)
            {
            } //Really do nothing? Shouldn't we stop the service?
        }
    }

In order to reduce the time the thread spends outside the Receive method, we don't parse the message once received but store it inside one of 4 special queues that will be read by other worker threads. As far as I know, .NET scheduler is greedy, so no matter how long threads are waiting, higher priority threads will be scheduled earlier and potentially cause starvation, so this is why we currently don't care about the increasing number of threads in the application (they are globally around 20).

Not only we increase thread priority, but we try to increase the UDP buffer size to 1MB. Here is a fragment of the initialization code

try
{
    Socket clientSock = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp)
                            {
#if !MONO
                                //Related to Mono bug 643475
                                ExclusiveAddressUse = true,
#endif
                            };

    if (ReceiveBufferSize >= 0) clientSock.ReceiveBufferSize = ReceiveBufferSize;
    clientSock.Bind(localEp);
    _client = new UdpClient {Client = clientSock};
}
catch (SocketException ex)
{
    throw new LogbusException("Cannot start UDP listener", ex);
}

ReceiveBufferSize is configured at runtime...

Each log message sent by Apache is very short, I think no more than 50 bytes. We run a gigabit Ethernet in our lab.

During the last experiment with such a configuration, the log server received only 700+ of the more than 2900 generated. Wireshark reported more than 2900 messages on the UDP socket, but the log trace of Logbus (which stores all received messages into a file) reports only these 700/800. Doing cat /proc/net/udp and tricking with lsof to find the correct row reports lots of dropped packets. The logs were definitely sent at a very high rate. If we modify Apache core to sleep for a short time (little less than a millisecond) after each log call, we would reduce loss to zero, but performance would be also reduced to almost-zero too. We will do such a test, but we must prove the effectiveness of Logbus-ng in real-life scenarios :(

My straight questions are

  1. Does UdpClient.ReceiveBufferSize help preventing packet loss? What else can I do in C#?
  2. It is obviously supposed to work in Mono too, but do you know about possible bugs with that property? I mean, did anyone ever reported a bug? (Mono 2.8)
  3. Do you know if sending packets to localhost first may reduce packet loss? (I would run a special instance of the log server on the web server machine, then forward logs via TLS, which doesn't lose, to the real log server)
  4. What would you suggest me to decrease the loss rate

We must currently perform a special test with Apache, and we can only use UDP to deliver messages. We can't choose TLS because we have only C# APIs for that.

Thank you in advance for any help. I hope to have been clear. You can find the source code of the UDP receiver on SVN if it helps

+1  A: 

The ReceiveBufferSize definately affects UDP sockets (ie. UdpClient), if the packet loss is due to buffer overflow then yes increasing the ReceiveBufferSize will help.

Keep in mind that if the data rate is so high that you simple cannot read from the buffer quick enough for long enough then it is inevitable that you will overflow even the largest of buffers.

I have used UdpClient.Client.ReceiveBufferSize effectively on Mono 2.6.7 running on Ubuntu, so I believe the Mono implementation is fine, of course I have not used this with Mono 2.8 yet.

From my experience, sending UDP packets to localhost at extremely high rates, some packet loss is possible, though I have never experienced this packet loss in a realworld application. So you might have some success with this approach.

You also need to look at were the packet loss is occuring, it might be that the packet loss is due to network infrastructure, packet collisions, switch might be dropping the packets because of some limit on the switch.

Simply put, you need to be ready to handle and expect packet loss when using UDP.

Chris Taylor
I have no doubt that packet loss is a concrete risk. Most logging frameworks tolerate such a loss (and then rely on UDP rather than on TCP) and I demonstrated that long-term dependability analyses are not significantly affected by packet loss. But you understand that losing more than 70% of packets is BAD. Thanks to your answer I'll try to move my effort in this direction, and let you know next week when I'll repeat my experiments, maybe with a BIG HUGE buffer in localhost. :) Thanks Chris, you have been of help for us
djechelon