views:

95

answers:

3

Hello,

I am a DSP,Embedded software programmer, looking to improve my assembly language programming skills. For 7 years of my career i have been programming in C, Matlab, little bit of assembly language coding.(ARM assembly, DSP processor assembly).

Now i want to improve my assembly language coding skills(it can be any assembly language, doesn't matter) by a big quantum, and take it to an 'expert level'. I know that programming more in it would be the way to it, but what i am asking here is:

-People's experience in coding in assembly languages(any),which they have gained over years of coding in assembly language.

-Guidelines to keep in mind while learning new assembly language

-Specific tips and tricks to code efficiently and correctly in assembly languages

-How to efficiently convert a given C code into a optimal assembly code

-How to clearly understand a given assembly code

-How does one keep track of the registers which would have operands in it,Stack pointer, program counters, how to be closer in understanding the underlying architecture and the resources it provides for a programmer, etc..

Basically i want to get some "real life" tips from people who have done exhaustive and intensive assembly language programming.

thank you.

-AD

+2  A: 

A good place to start would be Jeff Duntemann's book, Assembly Language Step-by-Step. The book is about x86 programming under Linux. As I recall, a previous version of the book covered programming under Windows. It's a beginner's book in that it starts at the beginning: bits, bytes, binary arithmetic, etc. You can skip that part if you like, but it might be a good idea to at least skim it.

I think the best way to learn ASM coding is by 1) learning the basics of the hardware and then 2) studying others' code. The book I mentioned above is worthwhile. You might also be interested in The Art of Assembly Language Programming.

I've done quite a bit of assembly language programming in my time, although not much in the last 15 years or so. As one commenter pointed out, the slight size and performance gains are hard to justify when I take into account the increased development and maintenance time compared to a high level language.

That said, I wouldn't discourage you from your quest to become more efficient with ASM. Becoming more familiar with how a processor works at that level can only improve your HLL programming skills, as well.

Jim Mischel
+2  A: 

My answer is generally...write a disassembler. You have touched on ARM, perhaps you know all of the ARM instructions, perhaps not, what about thumb? ARM is a good one to learn with this method, both popular and fixed instruction length, so you can disassemble linearly from beginning to end.

I dont mean write a polished sourceforge worthy disassembler, maybe write 5 or 10 lines of assembler at a time, max, maybe the same instruction with different registers, just enough to parse the binary with a bit if-then-else tree or switches.

add r0,r0,#1
add r0,r1,#1
add r0,r2,#2

Your goal is to examine each bit in the opcode, understand why you can only have 8 bit immediates, understand why some processors only let you jump 127 or 128 bytes for a local conditional branch. You dont have to write a disassembler to do this, but for me it works to embed the information into my brain.

In order to create all the possible opcodes/instructions to test the disassembler, you end up learning all the syntax nuances for the assembler you are using. The assembly language in the chip companies book is not necessarily the exact syntax used by every assembler for that processor family. The mrc/mcr instructions (ARM) are a good example of this. gas in particular is known for the horrible job it does changing the syntax, making it more painful than the chip companies syntax and tools. It depends on what you are trying to do, if you just want to code a few lines or modify something, you dont need to know every corner case or assembler feature, but if you really want to learn the instruction set then I recommend this approach.

I am also an embedded software engineer, primarily using C but daily disassembling that C (using objdump, not my tools) examining the output, insuring that this code is in this memory area and that code here, linker stuff. But I sometimes have to examine a simulation of the processor/chip and need to follow the instruction fetches and their associated I/O to follow the code through the simulation. Or debugging a board with a logic analyser on the ram or some other bus. I have learned many different processors, 8, 16, 32, 64 bit, (and ones with register lengths not in that list) cisc, risc, dsp and a couple of microengines. Wrote a disassembler for every one of them (well except pdp11 and x86, my first two instruction sets), takes maybe an afternoon to learn a new ISA once you have seen a few of them. No, it takes a day or two for me to switch from one I have been using daily for days/weeks/months to one I have not used in months/years. I dont think in all languages at once.

Disassembling variable length instructions (most of the processors out there), really doing it right, is an art form in itself and WAY beyond what I am talking about, that is why I recommend only a handful of instructions at a time, do not embed the data in those instructions. Ideally use this method if you have a working/good disassembler handy, so you can compare your output to the real mostly tested and debugged disassembler.

Beyond disassembling if you are really eager, writing an emulator is a good exercise, again I say writing instead of examining. Many cores have emulators, and you could just examine them instead of writing your own, what works for me may not work for you. I have only written a couple of these. This is not an afternoon project, but you do get a deeper understanding of how that processor family works.

Whatever learning environment is best for you, be it disassembly, emulators, single stepping through a gui based ISA simulator, books, web pages. Learning assembler for one or many processors will definitely make your high level programming better. Even if you actually never write assembler but only examine it. Write some C code that uses arrays and pointers and structures and without structures, loops, unrolled loops, compile each of these with various compiler options, with and without debugger stuff enabled, with no optimisation, through to max/aggressive optimisation. (compile for different processors and compare the differences in program flow, number of instructions, etc. llvm is great for this).

In addition to making your high(er) level coding better you also learn what compilers are good and bad and average. What gee whiz syntax you should avoid even if it is part of some standard, and what syntax most compilers get right. I highly recommend trying as many different compilers as you can.

I recommend checking out vastly different families, ones with no/less inbreeding, I mentioned ARM/thumb (and thumb2) which are definitely inbred, but popular and will pay the bills so you can learn the others in your spare time. Go back to the 6802 or 68hc11, the 8088 and/or the z80. The old pic pic12, or pic16 (the pic32 is just a mips). mips, power pc, avr. I am huge fan of the msp430 instruction set a very good one to learn, had a pdp11 feel, compiler friendly, sadly targeted at a niche market. The 8051, still not dead yet, amazingly. The older ones, most of them, have instruction set simulators in various forms (mame for example has many), so you can take those simulators and print memory and registers as your program executes to watch and learn and improve. Then compare those older ones with more modern ones. See why some ISAs at the same clock rate outperform others by leaps and bounds, some have a single accumulator, one register, maybe two or four, and to do anything useful you have to constantly load and store, taking several instructions for one real operation. Where something more modern does that real operation in one or two or three instructions/clocks, by simply having more registers or generic registers instead of special purpose registers.

An advanced topic is memory accesses. Thumb (not thumb2) is not as efficient as ARM, there is a noticeable overhead, 5-10% more instructions required for the same task, so why is thumb considerably faster on a GameBoy Advance? Answer, mostly 16 bit memory buses with non-zero wait state memory. The GBA does not have a cache, but does have a prefetch deal on the rom interface and rom timing is non-linear, first read is N clocks and reads from sequential addresses that follow are M clocks (M is less than N) (which makes rom execute faster than ram). Not knowing this can make the difference between success and failure for your embedded program for this platform and others. goes way beyond the assembly language understanding, but you cant get there without being able to read and understand the output of the compiler.

Another advanced topic is caches. If you have access to something with a cache and can turn it on an off (say something from gamepark a gp32 or wiz, an older ipod that you can do homebrew on), etc. Ideally something you can control the instruction and data caches separately. You get the feel for a wholly different kind of optimisation, it is no longer about the fewest instructions with the fewest jumps/branches, and fewest memory accesses. Now you have to deal with the length of the cache line, where instructions land within that cache line. Adding one, two, three, sometimes more nops at the beginning of a program (no really, literally adding a nop in start.S) can dramatically improve or ruin the performance of a program generated by the same (higher level) source, compiler, and optimisation settings. Gotta examine the instructions and understand the hardware to see why.

Your questions specifically:

-People's experience in coding in assembly languages(any),which they have gained over years of coding in assembly language.

see above

-Guidelines to keep in mind while learning new assembly language

See above. Believe that processors are more similar than different, they load and store registers, branch unconditionally and conditionally. The same handful of conditional branches are well known and used. Look for the common instructions first, the load immediate, move from one register to another, register based add, and, or, xor. Not all processors have a divide instruction, most dont, some dont have a multiply, more than you think. And most that do you cannot use generically, if the operands and result of a multiply are all the same size register, then many combinations of operands will overflow the result.

-Specific tips and tricks to code efficiently and correctly in assembly languages

Drive down the middle of the road, dont get into cool tricks specific to this assembler/compiler, or gee whiz features of a language. Keep it simple, some of my 20 year old C code still compiles today on many compilers. I often find code a few years or less old out in the world that doesnt compile today, has to be constantly maintained to perform the same function with new compilers, simply because of compiler or language tricks.

-How to efficiently convert a given C code into a optimal assembly code

Start with C or other, and compile and disassemble, maybe a few levels of optimization, maybe a few different compilers. Then just fix up the problems. This is a fun task, but really you fall into that gee whiz trap. Often, saving that 1 or 2 or 7 instructions out of 5 or 10 or 20 is not worth having to carry the assembler around with the C and putting you in a non-portable situation, or in a situation where the compiler may catch up in the next version or two, and even exceed your abilities because they know more of the instructions and how to use them than you do.

Where I use assembler most (other than booting naturally) is actually for reading and writing registers or memory locations. Every compiler I have used has at one point in time failed to get the right instruction, replaced a 32 bit store with an 8 bit, that kind of thing. I actually waste instructions and clocks to implement peek and poke routines in assembler to insure the compiler wont bury me. Memory copies and things like that are generally really good (in C libraries), but are places where you can take advantage of an instruction set. Taking advantage of specific instructions that are not part of the language you are using, bit tests or bit set (that the compiler doesnt recognise/optimise to). Byte swapping if you have a byte or halfword swap instruction. Certain rotates or shifts or sign extensions.

If you can find it, well it is out there for free as part of the black book, Michael Abrash, Zen of Assembly language. Measure the execution time and test, test, test. No matter how good you think you are the stopwatch will show the real winner. Hardware has eliminated half of his teachings, but the thought process, and the depth of examining the code at that level of detail (I have the original book in print BTW), later magazine articles went into the super scaler processors and simply re-arranging some instructions so they could be recognized and handed off to separate execution units making the same instructions execute many times faster were interesting to read and understand. Here again much of this has been buried in the noise by pipelines, more execution units, parallel processing, faster clocks. Actually it is all a result of horrible programming languages that are so inefficient that the hardware has to compensate. But that makes it even more fun for us when we can perform the same operation many thousands to tens of thousands of times faster than our peers.

It is very very easy to shoot yourself in the foot with this activity though (improving C output with assembler), proceed with caution. You have been warned.

-How to clearly understand a given assembly code

That is the point of the exercise. If you are writing your own assembler and driving down the middle of the road there is a subset of popular instructions, easy to read, easy to write, you know them well. You take compiler generated instructions and try to examine them, that is harder, the disassembler is as much of the help/problem as the code that was generated. Take old school game roms written by hand in assembler or machine code, even harder.

-How does one keep track of the registers which would have operands in it,Stack pointer, program counters, how to be closer in understanding the underlying architecture and the resources it provides for a programmer, etc..

This is often beyond the assembler, you have to understand pipelines, prefetching, branch shadows, caches, write buffers, memory busses, wait cycles.

Another answer depending on what you were really asking here is to know the compilers calling convention, are the operands for a function stored in r0, r1, r2 ... and if so how many are in registers before they go to the stack. Does this compiler put everything on the stack? Are the flags stored on the stack too? Where is the return address stored? These CAN vary by different compilers for the same target as in the x86 in the old days (Zortech/Watcom vs Microsoft/Borland), or for the same processor for the same compiler as we see in modern times (ABI and EABI). Modern times you may find an interface is designed and defined by someone (the chip company itself?) and various compilers will meet that standard for various reasons, portability, marketing, laziness, etc. I find examining the disassembly and driving down the middle of the road you can figure out the calling conventions without having to go and read a spec.

I learned assembly language early on and often to the annoyance of my peers I tend to re-use generic variables in my C as if I were writing assembler. So keeping track of what data is in what variable at what point of time in the program is by habit natural for me. Ymmv. While analysing someone elses assembler or compiler output I will hack up that output within the text editor I am using to read it. Putting visual spaces, blank lines between functional blocks, making comments after each instruction as to what is now in a register, r0 holds the index number in a table, r1 now holds the word offset of that item in the table, r0 now holds the physical address of that item in the table, r2 now holds the item itself from the table, etc.

good luck, have fun, sorry for the really long answer.

dwelch
A: 

The best way is to compile C stuff and look at the output and search for the documentation of the instructions you don't know. The experince will come with time...

Quonux