views:

275

answers:

8

A program is compiled from some language to ASM --> Machine Code (directly executable). When people say that this is platform dependent, the mean that the binaries formed will run (correctly) only on the CPUs with same Instruction Set Architecture like x86, x86-64. It may (incorrectly) / may not (at all) run on other processes because of the difference in ISA. Right?

Now, the concept of binaries is confusing me. Everything is about the "Machine Language Code" & "CPU". Where does the OS come into play? I mean the compiled binary has direct instructions for CPU when it is loaded into memory. & CPU executes one instruction at a time. I couldn't see the role of Operating System any where except in process management link text . It should be running on the CPU of same ISA irrespective of Operating System. right?

But its not the case. If I build a code to x86 on windows machine. It won't run on Mac x86 machine or Linux x86 machine.

I'm missing something here. Please clear my confusion.

A: 

The OS provides the tools and API for access to certain features and the hardware.

For example to create a window on Microsoft Windows, you need the OS's DLL to create the window.

Unless you wish to write the API yourself, you'll use the API that the OS provides. That's where the OS come into play.

thephpdeveloper
At a high level this is correct. However, you couldn't "write the API yourself" since the OS prevents you from accessing the hardware or page table directly. So at some level you'd still need to make OS-specific syscalls.
Robert Fraser
+5  A: 

Two ways:

First and foremost the answer is "system calls". Whenever you call a function that needs to do any I/O, interact with devices, allocate memory, fork processes, etc., that function needs to do a "system call". While the syscall instruction itself is part of X86, the available system calls and parameters to them are OS-specific.

Even if your program doesn't make ANY system calls (which I'm not sure is possible, and certainly wouldn't be very useful) the formats that wrap around the machine code are different for different OSes. So the file formats of exe (PE) and a linux executable (ELF usually) are different, which is why an exe file won't execute on Linux.

EDIT: these are low-level details. The higher-level answer is to say that anything that needs to access files, the console/GUI, allocate memory, etc. is OS-specific.

Robert Fraser
claws
No, it still calls fopen(). Somewhere in fopen is a "syscall" instruction. The syscall instruction changes the processor into "kernel mode", which removes all sorts of protections and allows the system to actually access the hardware. Your program runs in a protected mode and can't access the hardware at all.
Robert Fraser
>While the syscall instruction itself is part of X86, the available system calls and parameters to them are OS-specific.Where can I find them? I just want to have a glance at the different system calls of different OS for same function say "Opening a file". I'm googling but couldn't find what I'm exactly looking for.
claws
For linux: http://www.kernel.org/doc/man-pages/online/pages/man2/syscalls.2.html -- For windows: http://www.metasploit.com/users/opcode/syscalls.html
Robert Fraser
+3  A: 

The OS comes into play when you try to access "a service" which it abstracts out for you at the hardware level, e.g. open a file inside the "database" called filesystem, generate a random number (every modern OS has this feature).

Under GNU/Linux for instance, you got to fill in the registers and call int 80h to access a "service" (actually called "syscall").

Your program won't run on another OS also because there are different file formats for executables, for example Win has COFF/PE, Linux has the ELF file format (just like any other file format, this also contains "meta data", e.g. the HTML (or SGML) file format).

Flavius
NB: That "service" is a sort of low-level function available in kernel mode and not to be confused with a "Windows Service" (aka daemon on *nix OS).
0xA3
+1  A: 

The OS provides (a) the environment that your machine code runs in, and (b) standard services. Without (a), your code will never get to execute in the first place, and without (b), you would have to implement absolutely everything yourself and hit the hardware directly.

Christian Hayter
+5  A: 

For starters, a modern CPU has (at least) two modes, a mode in which it's running the core of the Operating System itself ("kernel mode") and a mode in which it's running programs ("user mode"). When in user mode, the CPU can't do a whole lot of things.

For instance, a mouse click is typically noticed in the kernel, not user mode. However, the OS dispatches the event to user mode and from there to the correct program. The other way around also requires cooperation: a program can't draw to the screen freely, but needs to go through the OS and kernel mode to draw on its part.

Similarly, the act of starting a program is typically a cooperation. The shell part of the OS is a user-mode program too. It gets your mouse click, and determines that it's a mouse click intended to start a process. The shell then tells the kernel-mode part of the OS to start a new process for that program.

When the kernel mode needs to start a new process, it first allocates memory for bookkeeping, and then proceeds to load the program. This involves retrieving the instructions from the binary, but also hooking up the program to the OS. This usually requires finding the entry point (classically int main(int argc, char** argv)) of the binary, and all points where the program wants to call the OS.

Different Operating Systems use different ways to hook up programs with the OS. As a result, the loading process differs, and the file formats for binaries can differ too. It's not absolute; the ELF format for binaries is used for a number of Operating Systems, and Microsoft uses its PE format on all its current Operating Systems. In both cases, the format does describe the precise format of the binary, so the OS can decide whether the program can be hooked up to the OS. For instance, if it's a Win32 binary, if will be in the PE format. Linux wonn't load that; Windows 2000 will, as will Windows 7-64. A Win64 binary on the other hand is in PE format too, but Windows 2000 will reject it.

MSalters
+5  A: 

It will not run on other processors since 01010110011 means something on x86 and something else on ARM. x86-64 happens to be backwards compatible with x86 so it can run x86 programs.

The binary is in a specific format that your OS understands (windows = PE, mac/linux = ELF)

With any normal binary, your OS loads it into memory and populates a number of fields with certain values. These "certain values" are addresses to api functions that exist in shared libraries (dll, so) such as kernel32 or libc. The API addresses are needed because the binary itself does not know how to access hard drives, network cards, gamepads etc. The program uses these addresses to invoke certain functions that exist in your OS or in other libraries.

In essence, the binary is missing some vital parts that need to be filled by the OS to make everything work. If the OS fills in the wrong parts, the binary won't work since they can't communicate with each other. That's what would happen if you would replace user32.dll with another file, or if you try to run a linux executable on mac osx.

So how does libc know how to open a file?

libc uses syscalls, which is low-level access to the OS core functions. It's sort of like a function call except you do it by populating certain CPU registers and then triggering an interrupt (special CPU instruction)

So how does the OS then know how to open files?

That's one of the things an OS does. But how does it know how to talk to a hard drive? I don't know exactly how that stuff works but I imagine the OS does this by writing/reading certain memory locations which happen to be mapped to BIOS functions.

So how does the BIOS know how to talk to a hard drive?

I don't know that either, I've never done any programming at that level. I imagine the BIOS is hardwired to the hard drive connectors and is able to send the correct sequence of 1 and 0 to talk "SATA" with the hard drive. It can probably only say simple things such as "read this sector"

So how does the hard drive know how to read a sector?

I really don't know this at all so I'll let some hardware guy continue.

Martin
A: 

Also I want to add that OS handles the startup of the program. It prepares process space and initializes it so that program can begin, loads the program instructions and gives control to the program.

Azho KG
A: 

The machine instructions generated by a high-level language will be appropriate for the calling conventions for libraries providing those calls you make, including any system calls (albeit these are usually wrapped in a userspace library somewhere, so specifics about how to make a system call might not be necessary).

Additionally, it will be appropriate for the targetted instruction set architecture, with a few exceptions (care must be taken for example, about assumptions regarding pointer sizes, primitive types, structure layouts, class implementations in C++ etc.).

The file format will dictate the necessary hooks/publically visible functions and data to enable the operating system to execute your code as a process, and to bootstrap the process to the required state. If you're familiar with development for C/C++ under Windows, the concept of subsystem dictates the level of bootstrapping, resources provided, and entry point signature (normally main(int, char **) on most systems).

There are some good examples of how the choice of high-level language, instruction set architecture, and executable file format might affect the ability to run a binary on any given system:

Assembly languages must code for a specific ISA. They use instructions that are specific to a family of CPU types. These instructions may work on other families of CPUs, if those CPUs support the given instruction set. For instance x86 code will work to a degree, on an amd64 operating system, and definitely work on an amd64 CPU running an x86 operating system.

C abstracts much of the specifics of an ISA. A few obvious exceptions include pointer sizes and endianness. Various well-known interfaces, will be provided to an expected level via libc, such as printf, main, fopen, and others. These include the expected register and stack states in order to make these calls, enabling C code to work on different operating systems and architectures without change. Other interfaces can be provided, either directly or by wrapping platform-specific into the expected interface to increase the portability of C code.

Python, and other similar "virtualized" languages operate at yet another level of abstraction, and again with a few exceptions, for instance features that don't exist on particular platforms, or character encoding differences, can run without modification on numerous systems. This is achieved by providing a uniform interface for many different ISA and operating system combinations, at the expense of performance and executable size.

Matt Joiner