views:

430

answers:

12

Possible Duplicate:
When is assembler faster than C?

Hello,

This is purely a theory question, so, given an "infinite" time to make a trivial program, and an advanced knowledge of C and Assembly, is it really better to do something in Assembly? is "performance" lost when compiling C into Assembly (to machine code)?

By performance I mean, do modern C compilers do a bad job at certain tasks that programming directly in Assembly speeds up?

Thank you.

+8  A: 

Modern C can do a better job than assembly in many cases, because keeping track of which operations can overlap and which will block others is so complex it can only be reasonably be tracked by a computer.

Mark Ransom
@Mark: I don't disagree with the sentiment of this answer, but why can't an assembler make the same optimizations regarding instruction scheduling as a C compiler?
indiv
@indiv: They can, in theory. In practice, it can be insanely difficult, particularly for RISC and RISC-like CPU's (which, these days, is essentially all of them).
Steven Sudit
Yes, you can do anything that the C compiler can do, it's just that *you* have to do it. Have fun =)
Ed Swangren
@Steven Out of curiosity, do you have a link with statistics about how many RISC CPUs there are? It was my understanding that most desktops/laptops are, for the most part, some variant of x86 and thus CISC
docgnome
I couldn't find any info on the current state of assemblers (probably it's pretty bleak), but I did find a paper on Intel's IA-64 Assembly Assistant that would optimize instruction scheduling (page 8). The paper also discusses some limitations of optimizing assembly vs C if anyone is interested. http://download.intel.com/technology/itj/q41999/pdf/assemble.pdf. And @Ed, I was referring to optimizations that can be done by the assembler when translating assembly to machine code.
indiv
@docgnome: Even supposed CISC chips, like the x86, have adapted many RISC techniques. For example, it used to be faster to use complex instructions to move bytes en-masse (MOVSW, etc) but now it's faster to use RISC-like load/store techniques.
Steven Sudit
@Steven Sudit: That was true on the Pentium, but by the time of the Pentium Pro `rep movsw` was faster again. Those instructions are still being improved - eg see here: http://lkml.org/lkml/2009/11/6/66
caf
@caf: Thank you for the interesting link. Looks like things have come around in at least that regard. I suspect my point about the difficulty of hand-scheduling ops to take full advantage of pipelining still stands, though.
Steven Sudit
+4  A: 

Unless you are an assembly expert and(/or) taking advantage of advanced opcodes not utilized by the compiler, the C compiler will likely win.

Try it for fun ;-)

More realistic solutions are often to let the C compiler do it's bit, then profile and, if needed, tweak specific sections -- many compilers can dump some sort of low-level IL (or even "assembly").

pst
You can compile the C and look at the Assembly Language output in the debugger. This lets you tweak the C and repeat the process until you've gotten the compiler to generate the code you want.
Steven Sudit
You can also generate assembly from arbitrary object code with objdump. Compiler support is not necessary.
Nathon
+2  A: 

Actually, C might be faster than assembly in many cases, since compilers apply optimizations to your code. Even so, the performance difference (if any) is negligible.

I would focus more on readability & maintainability of the code base, as well as whether what you are trying to do is supported in C. In many cases, assembly will allow you to do more low-level things that C simply cannot do. For example, with assembly you can take advantage of MMX or SSE instructions directly.

So in the end, focus on what you want to accomplish. Remember - assembly language code is terrible to maintain. Use it only when you have no other choice.

dacris
+4  A: 

Use C for most tasks, and write inline assembly code for specific ones (for example, to take advantage of SSE, MME, ...)

Yassin
Agreed. A friend and I were diddling around with square rooty things the other day, and they were able to write some assembly to take advantage of the XMM intrinsics: it blew the compiled code out of the water.
Paul Nathan
This was a theoretical question asked, not a practical one.
Lajla
+6  A: 

This question seems to stem from the misconception that higher performance is automatically better. There is too much to be gained from a higher level perspective to make assembly better in the general case. Even if performance is your primary concern, compilers usually do a better job creating efficient assembly than you could write yourself. They have a much broader "understanding" of all of your source code than you could possibly hold in your head. Many optimizations can be had from NOT using well-structured assembly.

Obviously there are exceptions. If you need to access hardware directly, including special processing features of CPUs (e.g. SSE), then assembly is the way to go. However, in that case, you're probably better off using a library that addresses your general problem more directly (e.g. numerics packages).

But you should only worry about things like this if you have a concrete, specific need for the increased performance and you can show that your assembly actually IS faster. Concrete specific needs include: noticed and measure performance problems, embedded systems where performance is a fundamental design concern, etc.

Cogwheel - Matthew Orlando
I agree with your point. On a lighter note - **"all of your source code than you could possibly hold in your head" ** And then they say this about human memory http://www.effective-mind-control.com/human-memory-capacity.html
Praveen S
I see your point, however I meant that statement from a slightly different angle. It's not so much a matter of memory as it is an intuitive awareness of all the different interactions among the various systems. Think L1 cache instead of Flash memory. :)
Cogwheel - Matthew Orlando
+4  A: 

C is not inefficient compared to anything. C is a language, and we don't describe languages in terms of efficiency. We compare programs in terms of efficiency. C doesn't write programs; programmers write programs.

Assembly gives you immense flexibility when comparing with C, and that is at the cost of time programming. If you are a guru C programmer and a guru Assembly programmer, then chances are you might be able to squeeze some more juice with Assembly for writing any given program, but the price for that is virtually certain to be prohibitive.

Most of us aren't gurus in either of these languages. For most of us, giving the responsibility of performance tuning to a C compiler is a double win: you get the wisdom of a number of Assembly gurus, the people who wrote the C compiler, along with an immense amount of time in your hands to further correct and enhance your C program. You also get portability as a bonus.

wilhelmtell
+2  A: 

Ignoring how much time it would take to write the code, and assuming you have all the knowledge that is required to do any task most efficiently in both situations, assembly code will, by definition, always be able to either meet or outperform the code generated by a C compiler, because the C compiler has to create the assembly code to do the same task and it cannot optimize everything; and anything the C compiler writes, you could also write (in theory), and unlike the compiler, you can sometimes take a shortcut because you know more about the situation than can be expressed in C code.

However, that doesn't mean they do a bad job and that the code is too slow; just that it's slower than it could be. It may not be by more than a few microseconds, but it can still be slower.

What you have to remember is that some optimizations performed by a compiler are very complex: agressive optimization tends to lead to very unreadable assembly code, and it becomes harder to reason about the code as a result if you were to do them manually. That's why you'd normally write it in C (or some other language) first, then profile it to find problem areas, and then go on to hand-optimize that piece of code until it reaches an acceptable speed - because the cost of writing everything in assembly is much higher, while often providing little or no benefit.

Michael Madsen
+2  A: 

It depends. C compilers for Intel do a pretty good job nowadays. I wasn't so impressed by compilers for ARM - I could easly write an assembly version of an inner loop that performed twice as fast. You typically don't need assembly on x86 machines. If you want to gain direct access to SSE instructions, look into compiler intrinsics!

Cornelius Scarabeus
Also a great point. x86 compilers are good. Other architectures, maybe not so good.
Paul Nathan
A: 

Given an infinite time and an extremely deep understanding on how a modern CPU works you can actually write the "perfect" program (i.e. the best performance possible on that machine), but you will have to consider, for any instruction in your program, how CPU behaves in that context, pipelining and caching related optimizations, and many many other things. A compiler is built to generate the best assembly code possible. You will rarely understand a modern complier generated assembly code because it tends to be really extreme. At times compliers fail in this task because they can't always foresee what's happening. Generally they do a great job but they sometimes fail...

Resuming... knowing C and Assembly is absolutely not enough to do a better job than a compiler in 99.99% cases, and considered that programming something in C can be 10000 times faster than programming the same assembly program a nicer way to spend some time is optimizing what the compiler did wrong in the remaining 0.01%, not reinventing the wheel.

HellBack
A: 

This depends on the compiler you use? This is no property of C or any language. Theoretically it's possible to load a compiler with such a sophisticated AI that you can compile prolog to more efficient machine language than GCC can do with C.

This depends 100% on the compiler and 0% on C.

What does matter is that C is written as a language for which it is easy to write an optimizing compiler from C -> assembly, and with assembly this means the instructions of a Von Neumann machine. It depends on the target, some languages like prolog will probably be easier to map on hypothetical 'reduction machines'.

But, given that assembly is your target language for your C compiler (you can technically compile C to brainfuck or to Haskell, there is no theoretical difference) then:

  • It is possible to write the optimally fast program in that assembly itself (duh)
  • It is possible to write a C compiler which in every instant shall produce the most optimal assembly. That is to say, there exists a function from every C program to the most optimal way to get the same I/O in assembly, and this function is computable, albeit perhaps not deterministically.
  • This is also possible with every other programming language in the world.
Lajla
A: 

No, compilers do not do a bad job at all. The amount of optimization that can be squeezed out by using assembly is insignificant for most programs.

That amount depends on how you define 'modern C compiler'. A brand new compiler (For a chip that has just reached market) may have a large number of inefficiencies that will get ironed out over time. Just compile some simple programs (the string.h functions, for example), and analyze what each line of code does. You may be surprised at some of the wasteful things an untested C compiler does, and recognize the error with a simple read-through of the code. A mature, well-tested, thoroughly optimized compiler (Think x86) will do a great job of generating assembly, though a new one will still do a decent job.

In no case can C do a better job than assembly. You could just benchmark the two, and if your assembly was slower, compile with -S and submit the resulting assembly, and you're guaranteed a tie. C is compiled to assembly, which has a 1:1 correlation with the bytecode. The computer can't do anything that assembly can't do, assuming that the complete instruction set is published.

In some cases, C is not expressive enough to be fully optimized. A programmer may know something about the nature of the data that simply cannot be expressed in C in such a way that the compiler can take advantage of this knowledge. Certainly, C is expressive and close to the metal, and is very good for optimization, but complete optimization is not always possible.

A compiler can't define 'performance' like a human can. I understand that you said trivial programs, but even in the simplest (useful) algorithms, there will be a tradeoff between size and speed. The compiler can't do this at a more fine grained scale than the -Os/-O[1-3] flags, but a human can know what 'best' means in the context of the purpose of a program.

Some architecture-dependent assembly instructions can't be expressed in C. This is where ASM() statements come in. Sometimes, these are not for optimization at all, but simply because there is no way to express in C that this line must use, say, the atomic test-and-set operation, or that we want to issue an SVC interrupt with the encoded parameter X.

The above points notwithstanding, C is orders of magnitude more efficient to program in and to master. If performance is important, analysis of the assembly will be necessary, and optimizations will probably be found, but the tradeoff in developer time and effort is rarely worth the effort for complex programs on a PC. For very simple programs which must be as fast as absolutely possible (like an RTOS), or which have severe memory constraints (like an ATTiny with 1KB of Flash (non-writable) memory and 64Bytes of RAM), assembly may be the only way to go.

reemrevnivek
Not all assembly has a 1:1 match with the bytecode - I have been working with a cpu that has a 'high level assembly' that the assembler takes.
Paul Nathan
@Paul I understand that you can use a high level assembly language if you want, on many processors including x86 - but I think that this is more properly a programming language in itself, not assembly as the question indicated. Is there a 'low level assembly' available for your processor? Even if there is no such assembler provided by the manufacturer, the output of your current assembler is just encoded low-level assembly.
reemrevnivek
It does translate into a low-level assembly - but that's not really supported for users. Most of the high-level stuff involves collapsing similar instructions into a single instruction with easier syntax. It doesn't have macro assembly type stuff.
Paul Nathan
A: 

It takes very little in the way of expertise to write assembly code that's faster than a compiler normally generates. If you want the code to be a lot faster, that takes more work (and yes, some expertise doesn't hurt either), but simply doing a little better than a compiler takes little more than a reasonable notion of the instruction set at hand. I've yet to see a single CPU/instruction set for which this wasn't true.

I do find it a bit humorous to note that compilers for the x86 are being held up as efficient, while compilers for RISC processors are noted as decidedly inferior. One of the basic notions of RISC was supposed to be that CISCs were designed largely for people to write assembly for, while RISC was intended primarily as a target for compilers. OTOH, there has been an extremely competitive market for compilers for the x86 for decades, and that does make a difference.

Jerry Coffin