views:

187

answers:

5

I am taking an assembly course now, and the guy who checks our home assignments is a very pedantic old-school optimization freak. For example he deducts 10% if he sees:

mov ax, 0

instead of:

xor ax,ax

even if it's only used once.

I am not a complete beginner in assembly programing but I'm not an optimization expert, so I need your help in something (might be a very stupid question but I'll ask anyway): if I need to set a register value to 1 or (-1) is it better to use:

mov ax, 1

or do something like:

xor ax,ax
inc ax

I really need a good grade, so I'm trying to get it as optimized as possible. ( I need to optimize both time and code size)

+3  A: 

It's better to use xor eax,eax; inc eax. It's shorter (3 bytes long) than mov eax,1 (5 bytes). You can experiment with OllyDbg if you're on Windows typing your asm code directly and seeing its size. For other commands you can refer various sites (e. g. mov). Though it may depend on your processor architecture. Anyway the best solution is to check CPU manual and instruction size in your debugger to be sure about the size.

Yasir Arsanukaev
I'm using 8086 assembly not the 386 and newer, so I'm only using 16 bit registers, no EAX just plain old AX. And if I'm not mistaking mov ax,1 takes 3 bytes in 8086, doesn't it?
Bob
+3  A: 

A quick google for 8086 instructions timings size turned up http://8086.tk/ which seems to have all the timings and sizes for the 8086 (and more) instruction sets.

No doubt you could find official Intel doco on the web with similar information.

For your specific question:

xor ax,ax
inc ax

takes 3+3=6 clock cycles and 2+1=3 bytes while

mov ax,1

takes 4 clock cycles and 3 bytes.

So the latter is better in that case.


But you need to talk to your educational institute about this guy. 10% for a simple thing like that beggars belief.

You should ask what should be done in the case where you have two possibilities, one faster and one shorter.

Then, once they've admitted that there are different ways to code depending on what you're trying to achieve, tell them that what you're trying to achieve is readability and maintainability and seriously couldn't give a flying leap about a wasted cycle or byte here or there*a.

Optimisation is something you generally do if and when you have a performance problem, after a piece of code is in a near-complete state - it's almost always wasted effort when the code is still subject to a not-insignificant likelihood of change.

For what it's worth, sub ax,ax appears to be on par with xor ax,ax in terms of clock cycles and bytes, so maybe you could throw that into the mix next time to cause him some more work.

*a) No, don't really, but it's fun to vent occasionally :-)

paxdiablo
thanks for the info, guess I'll use the xor inc option
Bob
While *in general* I agree that readability and maintainability are to be preferred to optimizations, in this specific case we are talking about assembler. Assembler is pretty much unreadable by definition, and higher level languages exist *because* people didn't want to code in assembler. Hence, in this context, where readability is just fucked up anyway, the more you optimize, the better. Of course, you'd better add tons of comments.
Lo'oris
There's nothing the least bit unreadable about `mov ax,1`. You would have to go to the next level of asm coder to be similarly comfortable with `xor ax,ax; inc ax`. There are levels of readability even in assembly. Myself, I would make a macro `set_1 ax` which just translated to the latter, gaining both readability _and_ speed/size :-)
paxdiablo
@Bob, sorry mate, I made a mistake in leaving out the cost on the `inc ax` - it turns out the `mov ax,1` is actually short and faster (and more readable).
paxdiablo
our professor said something like: "I know that in most cases these optimizations are irrelevant and insignificant but you guys should know about them because someday you just might need to do one." and also something like "In my time you could really see the difference in performance"
Bob
@Bob: That would make sense if you developed your own compiler, I believe you wouldn't think of it solving other tasks. Compilers often do automatic optimization.
Yasir Arsanukaev
`sub ax,ax` and `xor ax,ax` might seem similar, but modern processors know about `xor` not having a real dependency on `ax` value; it is not so certain with `sub`.
liori
@lion, that was specifically for the 8086, I don.t know if it had all that you-beaut stuff. But it seems to me that the dependencies and effects for xor ax,ax and sub ax,ax are exactly the same, as would be xor ax,N and sub ax,N where N is any type of object.
paxdiablo
@yasir, you're right, it's been a _long_ time since I could out-optimise a compiler :-)
paxdiablo
+1  A: 

Depending upon your circumstances, you may be able to get away with ...

 sbb ax, ax

The result will either be 0 if the carry flag is not set or -1 if the carry flag is set.

However, if the above example is not applicable to your situation, I would recommend the

xor  ax, ax
inc  ax

method. It should satisfy your professor for size. However, if your processor employs any pipe-lining, I would expect there to be some coupling-like delay between the two instructions (I could very well be wrong on that). If such a coupling exists, the speed could be improved slightly by reordering your instructions slightly to have another instruction between them (one that does not use ax).

Hope this helps.

Sparky
A: 

I would use mov [e]ax, 1 under any circumstances. Its encoding is no longer than the hackier xor sequence, and I'm pretty sure it's faster just about anywhere. 8086 is just weird enough to be the exception, and as that thing is so slow, a micro-optimization like this would make most difference. But any where else: executing 2 "easy" instructions will always be slower than executing 1, especially if you consider data hazards and long pipelines. You're trying to read a register in the very next instruction after you modify it, so unless your CPU can bypass the result from stage N of the pipeline (where the xor is executing) to to stage N-1 (where the inc is trying to load the register, never mind adding 1 to its value), you're going to have stalls.

Other things to consider: instruction fetch bandwidth (moot for 16-bit code, both are 3 bytes); mov avoids changing flags (more likely to be useful than forcing them all to zero); depending on what values other registers might hold, you could perhaps do lea ax,[bx+1] (also 3 bytes, even in 32-bit code, no effect on flags); as others have said, sbb ax,ax could work too in circumstances - it's also shorter at 2 bytes.

When faced with these sorts of micro-optimizations you really should measure the alternatives instead of blindly relying even on processor manuals.

P.S. New homework: is xor bx,bx any faster than xor bx,cx (on any processor)?

Bernd Jendrissek
+2  A: 

You're better off with

mov AX,1

on the 8086. If you're tracking register contents, you can possibly do better if you know that, for example, BX already has a 1 in it:

mov AX,BX

or if you know that AH is 0:

mov AL,1

etc.

Walter Bright