views:

238

answers:

3

So I was thinking about languages the other day, and it struck me that any program written in a compiled language that interacts with the Internet is then translated into assembly that has to interact with the Internet. I've just begun learning a bit of x86 assembly to help me understand C++ a bit better, and I'm baffled by how something so low-level could do something like access the Internet.

I'm sure the full answer to this question is much more than would fit in a SO answer, but could somebody give me maybe a basic summary?

+7  A: 

User-space programs that "interact with the internet", in all modern systems, do so by issuing system calls to the underlying operating system, which supplies the API for a TCP/IP stack.

The system calls in question (such as socket, listen, accept, and so forth) are typically documented at a C level, but in each particular OS implementation they will translate to machine code, of course. But whether values go in particular registers, or locations in memory pointed to by particular registers, etc, is pretty minor and totally system-specific.

If you're wondering how the machine code (probably also compiled from C) in the kernel and device drivers "interacts with the internet" (in response to system calls), it does so both by building and maintaining in-memory data structures to track the state of various things, and by interacting with the underlying hardware (e.g. via interrupts, I/O ports, memory mapped device areas, or whatever that particular architecture uses) -- just like it interacts with (say) a video display, or a disk device.

Alex Martelli
yeah... it's hard to imagine such a small user-mode program doing complex stuff like that, but really it's just making a call to a library that was written in something higher-level (C/C++) which is then translated by the compiler into a BUNCH of assembly code, which, at some really low level, sends a bunch of 5-volt HIGHs and 0-volt LOWs through an ethernet cable to the other side of the world... (if I understand correctly)
advs89
Well technically the physical Ethernet pulses only go as far as your router. The router is then responsible for retransmitting them on your behalf to the next router, and the next router is responsible for retransmitting them, etc. etc.
Tyler McHenry
+1  A: 

It depends. When you read about a web script written in C, it's actually a CGI program. CGI is a protocol, not a language. CGI specifies to put "GET", "POST", etc. into REQUEST_METHOD, "foo=bar?baz=42" into QUERY_STRING, post data into stdin, etc.. To access these, the CGI program uses system calls. The web server uses CGI to communicate with a web script. A program that communicates across the Internet by itself can use the system sockets API.

In summary, the operating system does all the communicating. The program just makes the right system calls.

If you wonder how the operating system communicates over the Internet, the answer is that the OS kernel uses a driver to interface with the network card over an IO port, memory-mapped IO, etc.. The OS and network card implement Internet Protocol standards for everything to work together.

Joey Adams
+1  A: 

What you need to do is to look up some of those PIC web-server projects. Some of them are web-servers written in assembly and running on 8-bit hardware. It will give you a clear idea of how something as low-level as assembly can be used to interact with the rest of the world through the Internet.

It basically involves

  1. Writing some low-level drivers (Layer 2) to interface with the networking hardware - this may be using ethernet or even modems (with SLIP).
  2. Write the next layers - IP and TCP - to process the TCP/IP packets. This will need some assembly magic as these processes are quite involved.
  3. Write the application layer - whether it be a web-server or client or whatever - that exploits the underlying layers.

Hope this clears up some doubt.

sybreon