views:

135

answers:

4

Hello!

I'm currently learning C++ and there are some (basic) things which I don't really know about and where I didn't find anything useful over different search engines.

  • Well as all operating systems have different "binary formats" for their executeables (Windows/Linux/Mac) - what are the differences? I mean all of them are binary but is there anything (beside all the OS APIs) that really differs?

  • (Windows) This is a dumb question - but are all the applications there really just binary (and I mean just 0's and 1's)? In which format are they stored? (As you don't see 0's and 1's in all the Text editors but mainly non-displayable characters)

Best regards, lamas

+7  A: 

Executables for Windows/Linux differ in:

  • The format of the file headers, i.e. the part of the file that indexes where and what's what in the rest of the file;
  • the instructions required for system calls (interrupts, register contents, etc)
  • the actual format in which binary code is linked together; there are several different ones for Linux, and I think also for Windows.

Applications are data and machine language opcodes, crammed into a file. Most bytes in an executable file don't contain text and can therefore contain values between 0 and 255 inclusive, i.e. all possible values. People would say that's binary. There are 8 bits in a byte, so each of those bytes could be said to contain 8 binary digits, some of which will be 0 and some 1.

Carl Smotricz
You are the top! I wouldn't be able to present such a topic without mentioning bass of dam!
alemjerus
+3  A: 

When you get down to it, every single file in a computer is "binary" in the sense that it is stored as a sequence of 1s and 0s on disk (even text files). When you open up a file in a text editor, it groups these characters up into characters based on various encoding rules. Now if the file is actually a text file, this will give you readable text. However, if the file is not, the text editor will faithfully try and decode the stream of bits, but will most likely end up with lots of non-displayable characters as the bits are not actually the encoded forms of characters, but of CPU instructions.

As for the other part of your question, about "binary formats": there are multiple formats for how to lay out the various parts of an executable, such as ELF or the Windows DLL/EXE format. These all specify exactly where in the file various parts of the executable are (i.e. where the metadata is, where the symbol table is, where the entry point is, where the static data and resources are, etc.)

Yuliy
+2  A: 

The most common file-format for Windows is PE; for Linux is ELF. They both contain mostly the same things (data segment, code segment, etc) and are only different simply because they were designed separately.

It should be noted that even if both Windows and Linux used the same file-format, they would still not be able to run each others' binaries, because the system APIs and available DLLs/SOs are completely different.

BlueRaja - Danny Pflughoeft
+5  A: 

Executable file formats for Windows (PE), Linux (ELF), OS/X etc (MACH-O), tend to be designed to solve common problems, so they all share common features. However, each platform specifies a different standard, so the files are not compatible across platforms, even if the platforms use the same type of CPU.

Executable file formats are not only used for executable files, but also libraries, which also contain code but are never run directly by the user - only loaded into memory to satisfy the needs to directly executable binaries.

Common Features of an executable file format:

  • One or more blocks of executable code
  • One or more blocks of read-only data such as text and numbers
  • One or more blocks of read/write data
  • Instructions on where to place these blocks in memory when the application is run
  • Instructions on what libraries (which are also in an 'executable file format') need to be loaded as well, and how they connect (link) up to this executable file.
  • One or more tables mapping code and data locations to strings or ids that describe them, useful for linking and debugging.

It's interesting to compare such formats to more basic formats, such as the venerable DOS .com file, which simply describes 64K of assorted 'stuff' to be loaded at the next available location, and has few of the features listed above.

Binary in this sense is used to compare them to 'source' files, which are written in text format. Binary format simply says that they are encoded in a non-text way, and doesn't really relate to the 0-and-1 sense of binary.

Alex Brown