views:

92

answers:

3

I somehow have the feeling that modern systems, including runtime libraries, this exception handler and that built-in debugger build up more and more layers between my (C++) programs and the CPU/rest of the hardware.

I'm thinking of something like this:

1 + 2 >> OS top layer >> Runtime library/helper/error handler >> a hell lot of DLL modules >> OS kernel layer >> Do you really want to run 1 + 2?-Windows popup (don't take this serious) >> OS kernel layer >> Hardware abstraction >> Hardware >> Go through at least 100 miles of circuits >> Eventually arrive at the CPU >> ADD 1, 2 >> Go all the way back to my program

Nearly all technical things are simply wrong and in some random order, but you get my point right?

  • How much longer/shorter is this chain when I run a C++ program that calculates 1 + 2 at runtime on Windows?

  • How about when I do this in an interpreter? (Python|Ruby|PHP)

  • Is this chain really as dramatic in reality? Does Windows really try "not to stand in the way"? e.g.: Direct connection my binary <> hardware?

A: 

It doesnt matter how many levels of abstraction there is, as long has the hard work is done in the most efficient way.

In a general sense you suffer from "emulating" your lowest level, e.g. you suffer from emulating a 68K CPU on a x86 CPU running some poorly implemented app, but it wont perform worse than the original hardware would. Otherwise you would not emulate it in the first place. E.g. today most user interface logic is implemented using high level dynamic script languages, because its more productive while the hardcore stuff is handled by optimized low level code.

When it comes to performance, its always the hard work that hits the wall first. The thing in between never suffers from performance issues. E.g. a key handler that processes 2-3 key presses a second can spend a fortune in badly written code without affecting the end user experience, while the motion estimator in an mpeg encoder will fail utterly by just being implemented in software instead of using dedicated hardware.

Ernelli
+3  A: 

"1 + 2" in C++ gets directly translated in an add assembly instruction that is executed directly on the CPU. All the "layers" you refer to really only come into play when you start calling library functions. For example a simple printf("Hello World\n"); would go through a number of layers (using Windows as an example, different OSes would be different):

  1. CRT - the C runtime implements things like %d replacements and creates a single string, then it calls WriteFile in kernel32
  2. kernel32.dll implements WriteFile, notices that the handle is a console and directs the call to the console system
  3. the string is sent to the conhost.exe process (on Windows 7, csrss.exe on earlier versions) which actually hosts the console window
  4. conhost.exe adds the string to an internal buffer that represents the contents of the console window and invalidates the console window
  5. The Window Manager notices that the console window is now invalid and sends it a WM_PAINT message
  6. In response to the WM_PAINT, the console window (inside conhost.exe still) makes a series of DrawString calls inside GDI32.dll (or maybe GDI+?)
  7. The DrawString method loops through each character in the string and:
    1. Looks up the glyph definition in the font file to get the outline of the glyph
    2. Checks it's cache for a rendered version of that glyph at the current font size
    3. If the glyph isn't in the cache, it rasterizes the outline at the current font size and caches the result for later
    4. Copies the pixels from the rasterized glyph to the graphics buffer you specified, pixel-by-pixel
  8. Once all the DrawString calls are complete, the final image for the window is sent to the DWM where it's loaded into the graphics memory of your graphics card, and replaces the old window
  9. When the next frame is drawn, the graphics card now uses the new image to render the console window and your new string is there

Now there's a lot of layers that I've simplified (e.g. the way the graphics card renders stuff is a whole 'nother layer of abstractions). I may have made some errors (I don't know the internals of how Windows is implemented, obviously) but it should give you an idea hopefully.

The important point, though, is that each step along the way adds some kind of value to the system.

Dean Harding
++ That's a good explanation. I would add that the IDE debugger doesn't cost any performance (unless you add a data break). Also, interpreted languages generally are 1-2 orders of magnitude slower than compiled. Depending on the application, that may or may not be a problem.
Mike Dunlavey
A: 

As codeka said there's a lot that goes on when you call a library function but what you need to remember is that printing a string or displaying a jpeg or whatever is a very complicated task. Even more so when the method used must work for everyone in every situation; hundreds of edge cases.

What this really means is that when you're writing serious number crunching, chess playing, weather predicting code don't call library functions. Instead use only cheap functions which can and will be executed by the CPU directly. Additionally planning where your expensive functions are can make a huge difference (print everything at the end not each time through the loop).

Daniel