Does TCP/IP prevent multiple copies of the same packet from reaching the destination? Or is it up to the endpoint to layer idempotency logic above it?
Please reference specific paragraphs from the TCP/IP specification if possible.
Does TCP/IP prevent multiple copies of the same packet from reaching the destination? Or is it up to the endpoint to layer idempotency logic above it?
Please reference specific paragraphs from the TCP/IP specification if possible.
I don't know about packet repitition, but I've never encountered it using TCP/IP and I know that it does guarantee that the packets all arrive and in the correct order, so I can't understand why it wouldn't.
Layers below TCP can experience multiple packets or dropped packets. Layers above TCP do not experience repetition or dropped packets.
It's the TCP stack's job to recover from duplicate packets:
The TCP must recover from data that is damaged, lost, duplicated, or delivered out of order by the internet communication system. This is achieved by assigning a sequence number to each octet transmitted, and requiring a positive acknowledgment (ACK) from the receiving TCP. If the ACK is not received within a timeout interval, the data is retransmitted. At the receiver, the sequence numbers are used to correctly order segments that may be received out of order and to eliminate duplicates. Damage is handled by adding a checksum to each segment transmitted, checking it at the receiver, and discarding damaged segments.
-- RFC 793 - Transmission Control Protocol, Section 1.5
However, if they're the same packets with new sequence numbers, then no.
TCP uses sequence numbers to detect duplication in the case of retransmission, which will also prevent trivial replay attacks.
From RFC 793, Section 3.3 - Sequence Numbers:
A fundamental notion in the design is that every octet of data sent over a TCP connection has a sequence number. Since every octet is sequenced, each of them can be acknowledged. The acknowledgment mechanism employed is cumulative so that an acknowledgment of sequence number X indicates that all octets up to but not including X have been received. This mechanism allows for straight-forward duplicate detection in the presence of retransmission. Numbering of octets within a segment is that the first data octet immediately following the header is the lowest numbered, and the following octets are numbered consecutively.
The duplicate detection will ensure that the same packet cannot be trivially retransmitted. Sequence numbers will also ensure that insertion (rather than replacement) of data in the data stream will be noticed, as further legitimate packets following forged packets will have duplicate sequence numbers, which will disrupt the data flow. This will likely cause those packets to be dropped as duplicates, which will likely break the protocol being used.
More information about the original (1981) TCP/IP specification can be found in RFC 793, and the many other RFCs involving extensions or modifications to the TCP/IP protocol.
Yes, the TCP layer prevents duplicate packets. The IP layer below it does not.
Details in RFC 1122.
It really depends on how you are receiving your data - although technically the protocol should not give you duplicates (i.e. packets with the same tcp checksum), other factors could cause you to see duplicates - for example, the network hardware you are using; also if you are using sniffers to look at tcp streams, rather than just reading an open socket in your application, it's possible to get dup packets from the sniffers even if the actual tcp streams they were monitoring did not have dup packets.
To give a real world example - At the moment I'm working on some tcp analysis of internal networks for a major stock exchange, and the data I'm looking at is coming in from multiple sniffers and being spliced back together. So in pulling in the data, I've found that I need to do a number of pre-processing steps, including finding and removing duplicates. For example, in a stream I just read in, of approx 60,000 data packets, I have located and removed 95 duplicate packets.
The strategy I take here is to keep a rolling window of the 10 most recent tcp checksums, and to ignore packets that match those checksums. Note this works well for PSH packets, but not so well for ACK packets - but I'm less concerned with those anyways.
I've written a special collection for the purpose of tracking this rolling window of tcp checksums, which might be helpful to others:
/// <summary>
/// Combination of a double-linked-list and a hashset with a max bound;
/// Works like a bounded queue where new incoming items force old items to be dequeued;
/// Re-uses item containers to avoid GC'ing;
/// Public Add() and Contains() methods are fully thread safe through a ReaderWriterLockSlim;
/// </summary>
public class BoundedHashQueue<T>
{
private readonly int _maxSize = 100;
private readonly HashSet<T> _hashSet = new HashSet<T>();
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
private readonly Item _head;
private readonly Item _tail;
private int _currentCount = 0;
public BoundedHashQueue(int maxSize)
{
_maxSize = maxSize;
_head = _tail = new Item();
}
private class Item
{
internal T Value;
internal Item Next;
internal Item Previous;
}
public void Add(T value)
{
_lock.Write(() =>
{
if (_currentCount == 0)
{
Item item = new Item();
item.Value = value;
_head.Next = item;
item.Previous = _head;
item.Next = _tail;
_tail.Previous = item;
_currentCount++;
}
else
{
Item item;
if (_currentCount >= _maxSize)
{
item = _tail.Previous;
_tail.Previous = item.Previous;
_tail.Previous.Next = _tail;
_hashSet.Remove(item.Value);
}
else
{
item = new Item();
_currentCount++;
}
item.Value = value;
item.Next = _head.Next;
item.Next.Previous = item;
item.Previous = _head;
_head.Next = item;
_hashSet.Add(value);
}
});
}
public bool Contains(T value)
{
return _lock.Read(() => _hashSet.Contains(value));
}
}}
You don't fully understand the problem. See this link: http://en.wikipedia.org/wiki/Transmission_Control_Protocol
On this page is write:
"TCP timestamps, defined in RFC 1323, help TCP compute the round-trip time between the sender and receiver. Timestamp options include a 4-byte timestamp value, where the sender inserts its current value of its timestamp clock, and a 4-byte echo reply timestamp value, where the receiver generally inserts the most recent timestamp value that it has received. The sender uses the echo reply timestamp in an acknowledgment to compute the total elapsed time since the acknowledged segment was sent.[2]
TCP timestamps are also used to help in the case where TCP sequence numbers encounter their 2^32 bound and "wrap around" the sequence number space. This scheme is known as Protect Against Wrapped Sequence numbers, or PAWS (see RFC 1323 for details)."
Regards, Joint (Poland)
You seem to be concerned about two different things:
Answer to 1:
TCP guarantees reliable, in-order delivery of a sequence of bytes. What ever data the client application send to TCP via write()
will come out exactly the same during the server's read()
call.
Answer to 2:
Replay attacks do not work well with TCP, since every connection depends on two random 32 bit numbers generated by the client and server respectively. For a replay attack to work, the attacker must guess the sequence number generated by the server for the fake connection it is initiating (theoretically, the attacker has a a 1 / 2**32 chance to guess correctly). If the attacker guesses incorrectly, she will at worst cause some buffering of data in your OS.
Note that just because a replay attack doesn't work, nothing prevents an attacker from forming a legitimate connection with your server and transmitting whatever data stream she wants to your application. This is why it's important to always validate input.