views:

1583

answers:

6

The socket API is the de-facto standard for TCP/IP and UDP/IP communications (that is, networking code as we know it). However, one of its core functions, accept() is a bit magical.

To borrow a semi-formal definition:

accept() is used on the server side. It accepts a received incoming attempt to create a new TCP connection from the remote client, and creates a new socket associated with the socket address pair of this connection.

In other words, accept returns a new socket through which the server can communicate with the newly connected client. The old socket (on which accept was called) stays open, on the same port, listening for new connections.

How does accept work? How is it implemented? There's a lot of confusion on this topic. Many people claim accept opens a new port and you communicate with the client through it. But this obviously isn't true, as no new port is opened. You actually can communicate through the same port with different clients, but how? When several threads call recv on the same port, how does the data know where to go?

I guess it's something along the lines of the client's address being associated with a socket descriptor, and whenever data comes through recv it's routed to the correct socket, but I'm not sure.

It'd be great to get a thorough explanation of the inner-workings of this mechanism.

+10  A: 

Your confusion lies in thinking that a socket is identified by Server IP : Server Port. When in actuality, sockets are uniquely identified by a quartet of information:

Client IP : Client Port and Server IP : Server Port

So while the Server IP and Server Port are constant in all accepted connections, the client side information is what allows it to keep track of where everything is going.

Example to clarify things:

Say we have a server at 192.168.1.1:80 and two clients, 10.0.0.1 and 10.0.0.2.

10.0.0.1 opens a connection on local port 1234 and connects to the server. Now the server has one socket identified as follows:

10.0.0.1:1234 - 192.168.1.1:80

Now 10.0.0.2 opens a connection on local port 5678 and connects to the server. Now the server has two sockets identified as follows:

10.0.0.1:1234 - 192.168.1.1:80
10.0.0.2:5678 - 192.168.1.1:80

17 of 26
Does the TCP/IP (which is it, btw?) tag the address-port pair in some way, hashing it into the socket to which to transfer the data for it?
Eli Bendersky
I don't know the implementation details (which probably vary from platform to platform), I just know that conceptually the sockets are identified by the quartet of information I described.
17 of 26
Do you have any reference on this?
qeek
+4  A: 

Back when I was working at Gandalf Canada (around 1992-93), the Stevens book "TCP/IP Illustrated" was invaluable for understanding all aspects of network communication. Unfortunately I haven't retained enough to answer your question, but I bet that book could. There is also a sequel, "TCP/IP Illustrated: The Implementation".

Paul Tomblin
A: 

As the other guy said, a socket is uniquely identified by a 4-tuple (Client IP, Client Port, Server IP, Server Port).

The server process running on the Server IP maintains a database (meaning I don't care what kind of table/list/tree/array/magic data structure it uses) of active sockets and listens on the Server Port. When it receives a message (via the server's TCP/IP stack), it checks the Client IP and Port against the database. If the Client IP and Client Port are found in a database entry, the message is handed off to an existing handler, else a new database entry is created and a new handler spawned to handle that socket.

In the early days of the ARPAnet, certain protocols (FTP for one) would listen to a specified port for connection requests, and reply with a handoff port. Further communications for that connection would go over the handoff port. This was done to improve per-packet performance: computers were several orders of magnitude slower in those days.

can you elaborate on the 'handoff port' part?
Eli Bendersky
This is either a description of some pre-TCP protocol, or overly simplified. A client attempting to connect to a listening socket sends a special packet to establish the connection (SYN bit set). There's a clear distinction between a packet creating a new socket and one using an existing socket.
John M
A: 

What confused me when I was learning this, was that the terms socket and port suggest that they are something physical, when in fact they're just data structures the kernel uses to abstract the details of networking.

As such, the data structures are implemented to be able to keep apart connections from different clients. As to how they're implemented, the answer is either a.) it doesn't matter, the purpose of the sockets API is precisely that the implementation shouldn't matter or b.) just have a look. Apart from the highly recommended Stevens books providing a detailed description of one implementation, check out the source in Linux or Solaris or one of the BSD's.

+4  A: 

Just to add to the answer given by user "17 of 26"

The socket actually consists of 5 tuple - (source ip, source port, destination ip, destination port, protocol). Here the protocol could TCP or UDP or any transport layer protocol. This protocol is identified in the packet from the 'protocol' field in the IP datagram.

Thus it is possible to have to different applications on the server communicating to to the same client on exactly the same 4-tuples but different in protocol field. For example

Apache at server side talking on (server1.com:880-client1:1234 on TCP) and World of warcraft talking on (server1.com:880-client1:1234 on UDP)

Both the client and server will handle this as protocol field in the IP packet in both cases is different even if all the other 4 fields are same.

Methos
A: 

(eth/ip/tcp) (eth/ip/udp)

eth have the mac and other info .. go to eeei org engeniring and get this and wifi( THE ORIGINAL STANDARDS) info for FREE.

A Port for each connection.

Server Port 80 - listening accept connection opens a new port. ip pack ( to 192.168.1.1:80 / from 192.168.1.2) it changes for example 192.168.1.1:1025 and warns the client it changed and accepted the connection.

So the server receives the a connect pack with client info port, ip etc. then : server send a accepted pack : client receive de accept with new port and ip.

easy.

Is virtual ip is a record or struct as u want to, and tcp , eth to, so can be changed and send over network to hack a connection confuse the system.

www.sfsys.eu

Sérgio Francisco