views:

926

answers:

6

I was under the impression that the instability of UDP is a property of the physical layer, but seems that it isn't:

I am trying to send a message over UDP, which is divided into a sequence of packets. Message identification and re-ordering is done implicitly.

I tested this method over two apps that run on the same computer, and expected it to run smoothly. However, even though the data tansfer was entirly between two programs on the same machine, there were packet losses, and quite frequent too. The losses also seem to be quite random: Sometimes the whole message got through, sometimes not.

Now, the fact the losses occur even on the same machine, makes me wonder wether I am doing it right?

Originally, I sent all the peices of the message asynchronously in a single-shot, without waiting for the completion of one peice before sending the next one.

Then, I tried to send the next peice of the message from within the completion routine of the previous one. That did improve the packet-loss ratio, but didn't prevent it altogether.

If I added a pause (Sleep(...)) between the peices, it works 100%.

EDIT: As the answeres suggested: packets are simply sent too fast, and the OS does minimal buffering. That's logical.

So, if I would like to prevent adding acknowledgement and re-transmission into the system (I could just use TCP then), what should I do? What's the best way to improve the packet-loss ratio, without dropping the datarate to levels that could have been higher?

EDIT 2: It occured to me that the problem might not be exactly buffer-overfill, rather than buffer-inavailablity. I am using async WSARecvFrom to receive, which takes a buffer that as I understand, overrides the default OS buffer. When a datagram is received, it is fed into the buffer, and the completion routine is called wether the buffer is full or not.

At that point, there is no buffer at all to handle incoming data, until WSARecvFrom is re-called from within the completion routine.

The question is if there's a way to create some sort of buffers-pool, so data could be buffered while a different buffer is being processed?

+5  A: 

In your case, you're simply sending the packets too quickly for the receiving process to read them. The O/S will only buffer a certain number of received packets before it starts discarding them.

The simplest mechanism to avoid this is to have the receiving process send back a minimal ACK packet, but for the transmitting process to carry on regardless if it hasn't received the ACK within a few milliseconds or so.

EDIT - essentially, UDP is "fire and forget". There's no feedback mechanism built into the protocol like there is with TCP. The only way to tune the transmission rate is for the far end to tell you that it's not receiving the whole stream. See also RFC 2309.


Re: Packet sequences - re-ordering doesn't happen because of the physical layer, typically it's because IP networks are "packet switched" as opposed to "circuit switched".

That means that each packet may take a different route through the network, and because those different routes can have different latencies, packets may then arrive out of order.

In practise these days very few packets are lost because of physical layer errors. Packets are lost because they're sent into a limited throughput pipe at a rate higher than that pipe can accommodate. Buffering can help this by smoothing out the packet flow rate, but if the buffer fills up you're back to square one.

Alnitak
I really don't see how is this an answer to what I asked?I wrote that re-ordering is done implicitly, and is not the issue here. Packet losses are. The question is why do they happen on a single machine "talking" to itself, and how to prevent, or at least minimize, it?
it's in the last sentence, but I'll expand it...
Alnitak
Read the answer again especially the last paragraph. The bottleneck is probably the buffering + low-level OS handling of packets, which has to be able to keep up.
Jason S
@Hammer - BTW, you didn't actually ask any specific question except "am I doing it right?"
Alnitak
@Alnitak: You're right. Both the first comment and the edit were done before you edited your answeres, sorry. However, as I wrote, I'd like not to use ACK.
A: 

I suspect the IP layer of your machine cannot transmit as fast as you sended them.

Maybe because the protocol allows dropping of packets, when the other goal - transmitting packets as fast as possible - can otherwise not be achieved.

Different results could be explained by other traffic or cpu eating processes on your machine, did you watch with top (unix) or prcess explorer (nt) during your tests?

A: 

You have to be doing something wrong. The only way you should be losing packets is 1) An unreliable network 2) You are sending data too fast to be handled by your receiving program. 3) You are sending messages that are bigger than the UDP max message size 4) Each device in your network has a max message size (MTU), so you might be exceeding a limit there.

In case #1, since you are sending on the same machine, the network is not even involved so it should be 100% reliable. You didn't say you had 2 network cards so I don't think this is an issue.

In case #2, you usually have to send a heck of a lot of data before you start dropping data. From your description, that does not sound like the case.

In case #3, make sure all your messages fall below this limit.

In case #4, I'm pretty certain if you meet the UDP max message size then you should be ok, but there very well could be some older hardware or custom device with a small MTU that your data is going through. If that is the case then those packets will be silently dropped.

I have used UDP on many applications and it has proven very reliable. Are you using MFC for receiving the messages? If you are, then you need to read the documentation very carefully as they clearly state some issues that you need to be aware of, but most people just gloss over them. I've had to fix quite a few of those gloss overs when people couldn't figure out why there messaging isn't working.

EDIT: You say that your packets are implicitly reordered. I might begin by verifying that your implicit reordering is really working correctly. That seems like the most likely candidate for your problem.

EDIT#2: Have you tried using a network monitor. Microsoft has (or at least used to) a free program called Network Monitor that will probably help.

Dunk
+2  A: 

If you're using UDP, the only way to detect packet loss as far as I know is going to involve some sort of feedback. If you're on a network with fairly consistent throughput, you could do a training period where you send bursts of data and wait for the receiver to respond and tell you how many packets of from the burst it received (i.e. make the receiver count and after a timeout, respond with the number it got). Then you just step up the amount of data per burst until you hit the limit and drop back down a little just to be sure.

This would avoid acks after the initial evaluation period, but will only work if the load on the network / receiving process does not change.

I've written UDP clients in Python before and the only time I've found any significant packet loss was when the input buffer on the receiving process was too small. As a result, when the system was under heavy load, you'd get packet loss because the buffer would silently overfill.

Jon Cage
+3  A: 

In order to avoid the problem of the OS buffers, you need to implement a rate control system. It can be closed-loop ( the receiver sends back ACKs and information about it's buffers ) or open-loop ( the sender slows itself down, which means you have to be conservative ).

There are semi-standard protocols for UDP to implement both. RBUDP ( Reliable Blast UDP ) springs to mind, and there are others.

Chris Arguin
+1 for citation of RBUDP, sounds interesting....
Jason S
After a moment's googling I've found a useful looking comparison [http://www.csm.ornl.gov/~dunigan/netperf/udp/UDP_RBUDP.html] ...so +1 from me :-)
Jon Cage
+1  A: 

If you pass the WSA_FLAG_OVERLAPPED flag to WSASocket(), you can call WSARecvFrom() multiple times in order to queue up multiple receive I/O requests. That way there is already another buffer available to receive the next packet, even before your completion routine queues another I/O request.

This doesn't necessarily mean you won't drop packets. If your program doesn't supply enough buffers fast enough, or it takes too long to process them and re-queue them, then it won't be able to keep up, and that's when some sort of rate limiting may be helpful.

bk1e