I have a question for all the hardcore low level hackers out there. I ran across this sentence in a blog. I don't really think the source matters (it's Haack if you really care) because it seems to be a common statement.

For example, many modern 3-D Games have their high performance core engine written in C++ and Assembly.

As far as the assembly goes - is the code written in assembly because you don't want a compiler emitting extra instructions or using excessive bytes, or are you using better algorithms that you can't express in C (or can't express without the compiler mussing them up)?

I completely get that it's important to understand the low-level stuff. I just want to understand the why program in assembly after you do understand it.

Edit: Wow! Thank you all for the fantastic responses! This was way better than I had hoped for!

+12  A: 

Usually, a layman's assembly is slower than C (due to C's optimization) but many games (I distinctly remember Doom) had to have specific sections of the game in Assembly so it would run smoothly on normal machines.

Here's the example to which I am referring.

Ólafur Waage
+1 Very true. Humans are very bad at writing long asm code.
Dead account
Keep in mind that said tools were not always available when the assembler was written.
Marco van de Voort
+29  A: 

I have not coded in assembly language for many years, but I can give several reasons that I frequently saw:

  • Not all compilers can make use of certain CPU optimizations and instruction set (e.g., the new instruction sets that Intel adds once in a while). Waiting for compiler writers to catch up means losing a competitive advantage.

  • Easier to match actual code to known CPU architecture and optimization. For example, things you know about the fetching mechanism, caching, etc. This is supposed to be transparent to the developer, but the fact is that it is not, that's why compiler writers can optimize.

  • Certain hardware level accesses are only possible/practical via assembly language (e.g., when writing device driver).

  • Formal reasoning is sometimes actually easier for the assembly language than for the high-level language since you already know what the final or almost final layout of the code is.

  • Programming certain 3D graphic cards (circa late 1990s) in the absence of APIs was often more practical and efficient in assembly language, and sometimes not possible in other languages. But again, this involved really expert-level games based on the accelerator architecture like manually moving data in and out in certain order.

I doubt many people use assembly language when a higher-level language would do, especially when that language is C. Hand-optimizing large amounts of general-purpose code is impractical.

+1  A: 

If you were around for all the Y2K remediation efforts, you could have made a lot of money if you knew Assembly. There's still plenty of legacy code around that was written in it, and that code occasionally needs maintenance.

+8  A: 

"Yes". But, understand that for the most part the benefits of writing code in assembler are not worth the effort. The return received for writing it in assembly tends to be smaller than the simply focusing on thinking harder about the problem and spending your time thinking of a better way of doing thigns.

John Carmack and Michael Abrash who were largely responsible for writing Quake and all of the high performance code that went into IDs gaming engines go into this in length detail in this book.

I would also agree with Ólafur Waage that today, compilers are pretty smart and often employ many techniques which take advantage of hidden architectural boosts.

+11  A: 

There is one aspect of assembler programming which others have not mentioned - the feeling of satisfaction you get knowing that every single byte in an application is the result of your own effort, not the compiler's. I wouldn't for a second want to go back to writing whole apps in assembler as I used to do in the early 80s, but I do miss that feeling sometimes...

Heh, it's the result of the assembler work! You write lots of macros in asm usually.
Mehrdad Afshari
Not just satisfaction, but an appreciation of precision. A concise process with everything about it declared is a joy to behold.
+2  A: 

Another reason could be when the available compiler just isn't good enough for an architecture and the amount of code needed in the program is not that long or complex as for the programmer to get lost in it. Try programming a microcontroller for an embedded system, usually assembly will be much easier.

+7  A: 

SSE code works better in assembly than compiler intrinsics, at least in MSVC. (i.e. does not create extra copies of data )

Marcus Lindblom
Good point, you need a compiler that does a decent job with intrinsics. The Intel and Gnu compilers are quite good, I don't know if the latest from PGI and PathScale are competitive yet, they didn't used to be.
+2  A: 

Beside other mentioned things, all higher languages have certain limitations. Thats why some people choose to programm in ASM, to have full control over their code.

Others enjoy very small executables, in the range of 20-60KB, for instance check HiEditor, which is implemented by author of the HiEdit control, superb powerfull edit control for Windows with syntax highlighting and tabs in only ~50kb). In my collection I have more then 20 such gold controls from Excell like ssheets to html renders.

+3  A: 

I think a lot of game developers would be surprised at this bit of information.

Most games I know of use as little assembly as at all possible. In some cases none at all, and at worst, one or two loops or functions.

That quote is over-generalized, and nowhere near as true as it was a decade ago.

But hey, mere facts shouldn't hinder a true hacker's crusade in favor of assembly. ;)

+2  A: 

Last time I wrote in assembler was when I could not convince the compiler to generate libc-free, position independent code.

Next time will probably be for the same reason.

Of course, I used to have other reasons.

+9  A: 

I started professional programming in assembly language in my very first job (80's). For embedded systems the memory demands - RAM and EPROM - were low. You could write tight code that was easy on resources.

By the late 80's I had switched to C. The code was easier to write, debug and maintain. Very small snippets of code were written in assembler - for me it was when I was writing the context switching in an roll-your-own RTOS. (Something you shouldn't do anymore unless it is a "science project".)

You will see assembler snippets in some Linux kernel code. Most recently I've browsed it in spinlocks and other synchronization code. These pieces of code need to gain access to atomic test-and-set operations, manipulating caches, etc.

I think you would be hard pressed to out-optimize modern C compilers for most general programming.

I agree with @altCognito that your time is probably better spent thinking harder about the problem and doing things better. For some reason programmers often focus on micro-efficiencies and neglect the macro-efficiencies. Assembly language to improve performance is a micro-efficiency. Stepping back for a wider view of the system can expose the macro problems in a system. Solving the macro problems can often yield better performance gains. Once the macro problems are solved then collapse to the micro level.

I guess micro problems are within the control of a single programmer and in a smaller domain. Altering behavior at the macro level requires communication with more people - a thing some programmers avoid. That whole cowboy vs the team thing.

+70  A: 

I think you're misreading this statement:

For example, many modern 3-D Games have their high performance core engine written in C++ and Assembly.

Games (and most programs these days) aren't "written in assembly" the same way they're "written in C++". That blog isn't saying that a significant fraction of the game is designed in assembly, or that a team of programmers sit around and develop in assembly as their primary language.

What this really means is that developers first write the game and get it working in C++. Then they profile it, figure out what the bottlenecks are, and if it's worthwhile they optimize the heck out of them in assembly. Or, if they're already experienced, they know which parts are going to be bottlenecks, and they've got optimized pieces sitting around from other games they've built.

The point of programming in assembly is the same as it always has been: speed. It would be ridiculous to write a lot of code in assembler, but there are some optimizations the compiler isn't aware of, and for a small enough window of code, a human is going to do better.

For example, for floating point, compilers tend to be pretty conservative and may not be aware of some of the more advanced features of your architecture. If you're willing to accept some error, you can usually do better than the compiler, and it's worth writing that little bit of code in assembly if you find that lots of time is spent on it.

Here are some more relevant examples:

Examples from Games

  • Article from Intel about optimizing a game engine using SSE intrinsics. The final code uses intrinsics (not inline assembler), so the amount of pure assembly is very small. But they look at the assembler output by the compiler to figure out exactly what to optimize.

  • Quake's fast inverse square root. Again, the routine doesn't have assembler it it, but you need to know something about architecture to do this kind of optimization. The authors know what operations are fast (mulitply, shift) and which are slow (divide, sqrt). So they come up with a very tricky implementation of square root that avoids the slow operations entirely.

High-Performance Computing

  • Outside the domain of games, people in scientific computing frequently optimize the crap out of things to get them to run fast on the latest hardware. Think of this as games where you can't cheat on the physics.

    A great recent example of this is Lattice Quantum Chromodynamics (Lattice QCD). This paper describes how the problem pretty much boils down to one very small computational kernel, which was optimized heavily for PowerPC 440's on an IBM Blue Gene/L. Each 440 has two FPUs, and they support some special ternary operations that are tricky for compilers to exploit. Without these optimizations, Lattice QCD would've run much slower, which is costly when your problem requires millions of CPU hours on expensive machines.

    If you are wondering why this is important, check out the article in Science that came out of this work. Using Lattice QCD, these guys calculated the mass of a proton from first principles, and showed last year that 90% of the mass comes from strong force binding energy, and the rest from quarks. That's E=mc2 in action. Here's a summary.

For all of the above, the applications are not designed or written 100% in assembly -- not even close. But when people really need speed, they focus on writing the key parts of their code to fly on specific hardware.

amazing response. Wish we could put this in a wiki!
@Paperino ... you can. Questions and answers on StackOverflow are licensed creative commons attribution.
Aaron Maenpaa
+5  A: 

Some instructions/flags/control simply aren't there at the C level.

For example, checking for overflow on x86 is the simple overflow flag. This option is not available in C.

You can compute overflow flags in C with bit operations.
@swegi: I bet that's insignificantly slower.
how often is that useful? and when it is, it can't possibly be the only reason for dropping into assembler.
+2  A: 

Aside from very small projects on very small CPUs, I would not set out to ever program an entire project in assembly. However, it is common to find that a performance bottleneck can be relieved with the strategic hand coding of some inner loops.

In some cases, all that is really required is to replace some language construct with an instruction that the optimizer cannot be expected to figure out how to use. A typical example is in DSP applications where vector operations and multiply-accumulate operations are difficult for an optimizer to discover, but easy to hand code.

For example certain models of the SH4 contain 4x4 matrix and 4 vector instructions. I saw a huge performance improvement in a color correction algorithm by replacing equivalent C operations on a 3x3 matrix with the appropriate instructions, at the tiny cost of enlarging the correction matrix to 4x4 to match the hardware assumption. That was achieved by writing no more than a dozen lines of assembly, and carrying matching adjustments to the related data types and storage into a handful of places in the surrounding C code.

+4  A: 

These days, for sequential codes at least, a decent compiler almost always beats even a highly seasoned assembly-language programmer. But for vector codes it's another story. Widely deployed compilers don't do such a great job exploiting the vector-parallel capabilities of the x86 SSE unit, for example. I'm a compiler writer, and exploiting SSE tops my list of reasons to go on your own instead of trusting the compiler.

Norman Ramsey
In that case, I'd use a compiler intrinsic.
Mehrdad Afshari
Still not the same. It's like a compiler without register optimizer
Marco van de Voort
+3  A: 

Defects tend to run per-line (statement, code point, etc.); while it's true that for most problems, assembly would use far more lines than higher level languages, there are occasionally cases where it's the best (most concise, fewest lines) map to the problem at hand. Most of these cases involve the usual suspects, such as drivers and bit-banging in embedded systems.

Justin Love
+1  A: 

No longer speed, but Control. Speed will sometimes come from control, but it is the only reason to code in assembly. Every other reason boils down to control (i.e. SSE and other hand optimization, device drivers and device dependent code, etc.).

Kelden Cowan
+2  A: 

If you are programming a low end 8 bit microcontroller with 128 bytes of RAM and 4K of program memory you don't have much choice about using assembly. Sometimes though when using a more powerful microcontroller you need a certain action to take place at an exact time. Assembly language comes in useful then as you can count the instructions and so measure the clock cycles used by your code.

+2  A: 

It doesn't seem to be mentioned, so I thought I'd add it: in modern games development, I think at least some of the assembly being written isn't for the CPU at all. It's for the GPU, in the form of shader programs.

This might be needed for all sorts of reasons, sometimes simply because whatever higher-level shading language used doesn't allow the exact operation to be expressed in the exact number of instructions wanted, to fit some size-constraint, speed, or any combination. Just as usual with assembly-language programming, I guess.

+2  A: 

Almost every medium-to-large game engine or library I've seen to date has some hand-optimized assembly versions available for matrix operations like 4x4 matrix concatenation. It seems that compilers inevitably miss some of the clever optimizations (reusing registers, unrolling loops in a maximally efficient way, taking advantage of machine-specific instructions, etc) when working with large matrices. These matrix manipulation functions are almost always "hotspots" on the profile, too.

I've also seen hand-coded assembly used a lot for custom dispatch -- things like FastDelegate, but compiler and machine specific.

Finally, if you have Interrupt Service Routines, asm can make all the difference in the world -- there are certain operations you just don't want occurring under interrupt, and you want your interrupt handlers to "get in and get out fast"... you know almost exactly what's going to happen in your ISR if it's in asm, and it encourages you to keep the bloody things short (which is good practice anyway).

+1  A: 

The only assembler coding I continue to do is for embedded hardware with scant resources. As leander mentions, assembly is still well suited to ISRs where the code needs to be fast and well understood.

A secondary reason for me is to keep my knowledge of assembly functional. Being able to examine and understand the steps which the CPU is taking to do my bidding just feels good.

+2  A: 

I've three or four assembler routines (in about 20 MB source) in my sources at work. All of them are SSE(2), and are related to operations on (fairly large - think 2400x2048 and bigger) images.

For hobby, I work on a compiler, and there you have more assembler. Runtime libraries are quite often full of them, most of them have to do with stuff that defies the normal procedural regime (like helpers for exceptions etc.)

I don't have any assembler for my microcontroller. Most modern microcontrollers have so much peripheral hardware (interrupt controled counters, even entire quadrature encoders and serial building blocks) that using assembler to optimize the loops is often not needed anymore. With current flash prices, the same goes for code memory. Also there are often ranges of pin-compatible devices, so upscaling if you systematically run out of cpu power or flash space is often not a problem

Unless you really ship 100000 devices and programming assembler makes it possible to really make major savings by just fitting in a flash chip a category smaller. But I'm not in that category.

A lot of people think embedded is an excuse for assembler, but their controllers have more CPU power than the machines Unix was developed on. (Microchip coming with 40 and 60 MIPS microcontrollers for under USD 10).

However a lot people are stuck with legacy, since changing microchip architecture is not easy. Also the HLL code is very architecture dependent (because it uses the hardware periphery, registers to control I/O, etc). So there are sometimes good reasons to keep maintaining a project in assembler (I was lucky to be able to setup affairs on a new architecture from scratch). But often people kid themselves that they really need the assembler.

Marco van de Voort
+1  A: 

Games are pretty performance hungry and although in the meantime the optimizers are pretty good a "master programmer" is still able to squeeze out some more performance by hand coding the right parts in assembly.

Never ever start optimizing your program without profiling it first. After profiling should be able to identify bottlenecks and if finding better algorithms and the like don't cut it anymore you can try to hand code some stuff in assembly.

+2  A: 

I have only personally talked to one developer about his use of assembly. He was working on the firmware that dealt with the controls for a portable mp3 player. Doing the work in assembly had 2 purposes:

  1. Speed: delays needed to be minimal.
  2. Cost: by being minimal with the code, the hardware needed to run it could be slightly less powerful. When mass-producing millions of units, this can add up.
Paul Williams
+1  A: 

If I am able to outperform GCC and Visual C++ 2008 (known also as Visual C++ 9.0) then people will be interested in interviewing me about how it is possible.

This is why for the moment I just read things in assembly and just write __asm int 3 when required.

I hope this help...

+1  A: 

I've not written in assembly for a few years, but the two reasons I used to were:

  • The challenge of the thing! I went through a several-month period years ago when I'd write everything in x86 assembly (the days of DOS and Windows 3.1). It basically taught me a chunk of low level operations, hardware I/O, etc.
  • For some things it kept size small (again DOS and Windows 3.1 when writing TSRs)

I keep looking at coding assembly again, and it's nothing more than the challenge and joy of the thing. I have no other reason to do so :-)

Chris J

The Dalvik VM that interprets the bytecode for Java applications on Android phones uses assembler for the dispatcher. This movie (about 31 minutes in, but its worth watching the whole movie!) explains how

"there are still cases where a human can do better than a compiler".


I don't, but I've made it a point to at least try, and try hard at some point in the furture (soon hopefully). It can't be a bad thing to get to know more of the low level stuff and how things work behind the scenes when I'm programming in a high level language. Unfortunately time is hard to come by with a full time job as a developer/consultant and a parent. But I will give at go in due time, that's for sure.