I recall reading somewhere that if a udp actually gets to the application layer that the data can assume to be intact. Disregarding the possibility of someone in the middle sending fake packets will the data I receive in the application layer always be what was sent out?
views:
245answers:
6UDP uses a 16-bit checksum so you have a reasonable amount of assurance that the data has not been corrupted by the link layer. However, this is not an absolute guarantee. It is always good to validate any incoming data at the application layer, when possible.
Please note that the checksum is technically optional in IPv4. This should further drop your "absolute confidence" level for packets sent over the internet.
Theoretically a packet might arrive corrupted: the packet has a checksum, but a checksum isn't a very strong check. I'd guess that that kind of corruption is unlikely though, (because if it's being sent via a noisy modem or something that the media layer is likely to have its own, stronger corruption detection).
Instead I'd guess that the most likely forms of corruption are lost packets (not arriving at all), packets being duplicated (two copies of the same packet arriving), and packets arriving out of sequence (a later one arriving before an earlier one).
Not really. And it depends on what you mean by "Correct".
UDP packets have a checksum that would be checked at the network layer (below the application layer) so if you get a UDP packet at the application layer, you can assume the checksum passed.
However, there is always the chance that the packet was damaged and the checksum was similarly damaged so that is is actually correct. This would be extremely rare - with today's modern hardware it would be really hard for this to happen. Also, if an attacker had access to the packet, they could just update the checksum to match whatever data they changed.
See RFC 768 for more on UDP (quite small for a tech spec :).
You are guaranteed only that the checksum is consistent with the header and data in the UDP packet. The odds of a checksum matching corrupted data or header are 1 in 2^16. Those are good odds for some applications, bad for others. If someone along the chain is dropping checksums, you're hosed, and have no way of even guessing whether any part of the packet is "correct". For that, you need TCP.
UDP uses a 16-bit optional checksum. Packets which fail the checksum test are dropped.
Assuming a perfect checksum, then 1 out of 65536 corrupt packets will not be noticed. Lower layers may have checksums (or even stronger methods, like 802.11's forward error correction) as well. Assuming the lower layers pass a corrupt packet to IP every n packets (on average), and all the checksums are perfectly uncorrelated, then every 65536*n packets your application will see corruption.
Example: Assume the underlying layer also uses a 16-bit checksum, so one out of every 2^16 * 2^16 = 2^32 corrupt packets will pass through corrupted. If 1/100 packets are corrupted, then the app will see 1 corruption per 2^32*100 packets on average.
If we call that 1/(65536*n) number p, then you can calculate the chance of seeing no corruption at all as (1-p)^i where i is the number of packets sent. In the example, to get up to a 0.5% chance of seeing corruption, you need to send nearly 2.2 billion packets.
(Note: In the real world, the chance of corruption depends on both packet count and size. Also, none of these checksums are cryptographically secure, it is trivial for an attacker to corrupt a packet. The above is only for random corruptions.)
Its worth noting the same 16-bit crc implementation applies to TCP as well as UDP on a per packet basis. When characterizing the properties of UDP consider the majority of data transfers that take place on the Internet today use TCP. When you download a file from a web site the same CRC is used for the transmission.
The secret is the physical and virtual layers (L1) of most access technologies is significantly more robust than TCP and the combined chance of error between L1 and L2 is very low.
For example modems had error correcting hardware and the PPP layer also had its own checksum.
DSL is the same way with error correction at the ATM (Solomon codes) and CRC at the PPPoA layers.
Docsis cable modems use similiar technology to that of DSL for error detection and correction.
The end result is that errors in modern systems are extremely unlikely to ever get past L1.
I have seen clock issues with old frame relay circuts 14 years ago routinly cause corruption at the TCP layer. Have also heard stories of patterns of bit flips on malfunctioning hardware promoting canceling of CRCs and corrupting TCP.
So yes it is possible for corruption and yes you should implement your own error detection if the data is very important. In practice on the Internet and private networks its a rare occurance today.
All hardware: disk drives, buses, processors, even ECC memory have their own error probabilities - for most applications their low enough that we take them for granted.