+4  A: 

UDP doesn't verify the target received it (like TCP does) - you must implement retransmission and such in your applications if you want to ensure all of the data arrives. Do you control the sending UDP source?

lunixbochs
I have no control over the source it simply sends datagrams (of size 512 bytes) at a rate of 1200 packets-per-second.
666craig
is it absolutely necessary to receive all of the data? what kind of data are you receiving?
lunixbochs
A: 

Edit - Struck out listen/accept sentence, thanks Daniel, I was just coming to remove it when I saw your comment :)

I'd suggest that this is a network programming issue, rather than python per-se.

You've set a packet-per-second rate and a duration to define the number of recv calls you make to your UDP socket. I don't see a listen or accept call to the socket, I'll assume that recv handles that as you say you receive some data. You've not mentioned the generation of the data.

You've defined how many reads you're expecting to make, so I'd assume that the code makes that many receives before exiting, so my conclusion would be that your recv packetSize is insufficient and therefore one read isn't pulling an entire datagram, then the subsequent recv is pulling the next part of the previous datagram.

Can't you look at the data you have received and determine what is missing? What data are you "losing"? How do you know it's lost?

Furthermore, you could use wireshark to verify that your host is actually receiving the data at the same time as verifying the size of the datagrams. Match the capture against the data your recv thread is providing.


Update

You say that you're losing data, but not what it is. I see two possibilities for data-loss:

  • Truncating packets
  • Dropping packets

You've said that the payload size is the same size as that which you are passing to recv, so I'll take it that you're not truncating.

So the factors for dropping packets are a combination of rate of receipt, rate of read-from-receive-buffer and receive-buffer size.

Your calls to Queue.put may be slowing down your rate of read.

So, first determine that you can read 1200 packets per second by modifying readFromUDPSocket to not Queue.put, but count the number of receives and report time taken.

Once you've determined that you can call recv fast enough, the next step is working out what is slowing you down. I suspect it may be your use of Queue, I suggest batching payloads in N-sized groups for placing on the Queue so that you're not trying to call put at 12Hz.

Seeing as you want to sustain a rate of 1200 reads per second I don't think you'll get very far by increasing the receive buffer on the socket.

MattH
accept() is for TCP. UDP is connectionless.
Daniel Stutzbach
Thanks. I'm indeed using Wireshark (great for network diagnosis), and it reports the data size as 512 bytes which is the size of packet that I'm using.
666craig
A: 

Firstly; can you set the recv buffer size for the socket? If so, set it to something very large as this will let the UDP stack buffer more datagrams for you.

Secondly; if you can use asynchronous I/O then post multiple recv calls at once (again this allows the stack to service more datagrams before it starts to drop them).

Thirdly; you could try unrolling your loop a little and reading multiple datagrams before placing them in your queue; could the locking on the queue be causing the recv thread to run slowly??

Finally; the datagrams may be being dropped elsewhere on the network, there may be nothing that you can do, that the U in UDP...

Len Holgate
Changing the buffer size for the socket didn't have any effect, but I'll try using multiple recv calls in the loop.
666craig
How quickly can you process the datagrams, can you process them faster than they arrive? If not then the stack WILL drop datagrams no matter how big you make the buffers. What does wireshark show if run from the machine running the python code? Are all the expected datagrams getting to that machine? They may be being dropped on the network. What does wireshark show if run on the machine generating the datagrams, it may think it's generating 1200/sec but the outbound stack may be dropping some and they may never see the wire...
Len Holgate
A: 

It seems that the problem is with the source. There are two issues:

  1. Looking at Wireshark the source is not consistently transmitting 1200 packets-per-second. Possibly, as Len pointed out, a problem with the outbound stack dropping data. BTW the source is a programmable card with an ethernet port connected to my machine.

  2. The other issue is the after the first 15 packets or so of data there is always a drop. What I discovered is that if I recv 20 packets in the initialisation part of the readFromUDPSocket thread, I can then read the data fine, e.g.

class readFromUDPSocket(threading.Thread):

    def __init__(self, socketUDP, readDataQueue, packetSize, numScans):
        threading.Thread.__init__(self)
        self.socketUDP = socketUDP
        self.readDataQueue = readDataQueue
        self.packetSize = packetSize
        self.numScans = numScans
        for i in range(0, 20):
            buffer = self.socketUDP.recv(self.packetSize)

    def run(self):
        for scan in range(1, self.numScans + 1):
            buffer = self.socketUDP.recv(self.packetSize)
            self.readDataQueue.put(buffer)
        self.socketUDP.close()
        print 'myServer finished!'

Not sure what this points to?! I think all of this rules out not being able to recv and put fast enough though.

666craig
IMHO you should be treating this as an optimisation step and no more. The program is, quite possibly, operating as it will do in a real world situation. You ARE going to have dropped datagrams due to the nature of UDP. Everything downstream from your recv() call should deal with that and this should be considered normal. IF you get everything then great. If not, make do with what you DO get, that's probably why it's being sent as UDP in the first place... What you are seeing now is possibly due to thread scheduling.
Len Holgate
+2  A: 

UDP is, by definition, unreliable. You must not write programs that expect UDP datagrams to always get through.

Packets are dropped all the time in TCP too, but your program does not need to care, because TCP applications cannot process packets; the TCP stack shows your application a stream of bytes. There is a lot of machinery there to make sure that if you send bytes 'ABCD', you will see 'A' 'B' 'C' 'D' on the end. You may get any possible collection of packets, of course: 'ABC', 'D', or 'AB', CD', etc. Or you may just see 'ABC', and then nothing.

TCP isn't "reliable" because it can magically make your network cables never fail or break; the guarantee that it provides is that up until the point where the stream breaks, you will see everything in order. And after the stream breaks, you'll see nothing.

In UDP there is no such guarantee. If you send four UDP datagrams, 'AB', 'CD', 'EF' 'GH', you may receive all of them, or none of them, or half of them, or just one of them. You may receive them in any order. The only guarantee that UDP tries to provide is that you won't see a message with 'ABCD' in it, because those bytes are in different datagrams.

To sum up: this has nothing to do with Python, or threads, or GTK. It's just a basic fact of life on networks based in physical reality: sometimes the electrical characteristics of your wires are not conducive to getting your messages all the way across them.

You may be able to reduce the complexity of your program by using Twisted, specifically, the listenUDP API, because then you won't be needing to juggle threads or their interaction with GTK: you can just call methods directly on the widget in question from your datagramReceived method. But this won't fix your underlying problem: UDP just drops data sometimes, period. The real solution is to convince your data source to use TCP instead.

Glyph
@Lefkowitz, it's generally helpful to leave a comment as to why you've gone to the effort of marking an answer down.
MattH