views:

569

answers:

10

I know this is a micro-optimization, so I ask out of pure curiosity.

Logically, a microprocessor does not need to compare all the bits of both operands of an inequality operator in order to determine a "TRUE" result.

Note, this is programming-related because it affects the execution speed of a program.

+7  A: 

Usually, the microprocessor does comparison using electrical gates and not step by step like that. It checks all bits at once.

Mehrdad Afshari
A: 

The compare operation occurs on the rising (or maybe falling) edge of the microprocessor's clock signal. Then the next operation occurs on the next clock cycle. So in terms of execution speed, equality and inequality take the same amount of time for almost every processor on the market today.

I say almost because I recall reading about some processors that were supposed to be not clock-based, but operation-time based, so if indeed the compare op was quicker than the add op, then a set of n compares would take less time than n adds. But I'm about 99% sure that was just some research project and not a commercial product :)

Mark Rushakoff
You are talking about incredibly simple processors compared to modern CPUS. With modern cpus, instructions are often re-ordered, executed simultaneously and several are retired (completed) at once. Any assumptions you have about the physical order of instruction execution or deficiencies between instructions are probably far too simple. In this example, it would be an obvious potential optimisation to have the CPU decode two instructions, turn them into one and execute it in a single clock.
Tom Leys
er *deficiencies -> dependancies. also, see the optimisation PDF from my other answer for more detai.
Tom Leys
The OP specifically mentioned microprocessors, as did I. My bad if starting with microprocessor, then just saying processor was ambiguous.
Mark Rushakoff
+2  A: 

Comparison is usually implemented as a subtraction that disregards the result. The adder in the CPU would operate on all bits simultaneously so this is a constant time operation.

Equality is then just determining if the output is 0. On x86, there are flags that are set as a result of comparison and the branch is done via jz or jnz (jump if zero, jump if not zero). So no, there would be no real speed difference.

Other platforms (such as ARM and IA64) behave similarly.

Michael
A: 

The amount of time it takes to do a comparison like this is generally one clock cycle.

A 32-bit processor will do all 32 bits at once; a 64-bit will do 64 bits at once.

If there is a delay, or stall, in the pipeline it would be because the operand isn't available and had to be fetched. That's where the greatest overhead is. But that would have been done in a chunk appropriate for the architecture of the processor, so it still would've been pulled in as a 32- or 64-bit unit.

lavinio
+3  A: 

This depends on your platform, but in general, it will perform identically.

For example, on X86, you can see this by looking at how assembly works. Check out the X86 assembly control flow operations - whether you're doing equality or inequality, it's done as 2 operations.

First, you do a CMP (comparison) operation. You then do a check to see if the comparison is equal, not equal, etc. This is just checking the results of the compare - in both cases, you're doing 2 operations.

In many higher level programming languages, however, things are different. Many languages define inequality in terms of equality - to check inequality, you do the equality check, then a second check to see if it's false. This causes equality to be (microscopically) faster in these languages. Many languages allow you to specifically write both, as well - but many people tend to write inequality in terms of equality, which again makes equality, in general, slightly faster.

Reed Copsey
As an added bonus, comparing to see if a value is equal or not equal to 0 is faster (no need to load the value you compare to into the CPU)
Tom Leys
@Tom - most ISA's support immediate values, so you comparing against a fixed value should be as fast as zero (there are exceptions of course).
Michael
+3  A: 

Sounds like you should read the Intel 64 and IA-32 Architectures Optimization Reference Manual.

Look in there for the "Pipeline latency" and the "Pipeline delay" on the instructions you use. Suffice to say that everything you do with ints takes around 1 clock cycle to execute (4 billion of those a second). Reading the data from memory can take 100-1000 depending on how much data you are working with. Much more important.

Tom Leys
A: 

The instructions themselves will execute at the same speed, as the other answers suggest.

Where you might encounter a difference would be in branch prediction or cache effects. This will vary from processor to processor and compiler to compiler, so it's impossible to make generalizations. If you are at the level where this would make a difference, the only way to know is to try it and measure.

Mark Ransom
This is true. The processor will currently assume that branches are not taken, i.e every if statement body is executed, without further hints. The compiler might realise the if is unlikely and structure it differently / put a branch hint in.
Tom Leys
+1  A: 

There are a few minor cases where it may have some effect.

On ARM processors (for the 32-bit/non-thumb instruction set architecture (ISA)), all instruction are conditional. Sometimes you can get away with an inner loop having a single branch (from the end to start) despite multiple conditions. In a few cases having a logical compare (TEQ) the disturbs few flags (affects negative (N) and zero (Z), but not carry (C) or overflow (V)), allows hairy code to avoid a branch instruction (untaken).

Conversely, IIRC (I've never actually programmed it, but have looked at the output of a C compiler over a decade ago) 68000 has a literal EOR/XOR instruction only for register D4. So an arithmetic comparison would probably be better (although you could still ignore the extraneous flags -the point is that the instruction set is a little irregular).

As mentioned by a previous poster, most of the action is higher up with memory, disk, network and web service latency.

Tom Hawtin - tackline
A: 

If you wanted to raise this to a more general question, you would have to consider a reasonable distribution of TRUE and FALSE answers, and you would have to consider arbitrary word-length, including longer than a register.

In searching algorithms (and sorting can be considered an extension of searching) it is more common to use operators like "<" or "<=" than "==". This is because the distribution of results from the "==" operator tend to be highly skewed toward "false" and thus they have low entropy (i.e. low information yield) per execution. This means they have to be executed more times to get the same information - witness linear search.

In either case, they take O(word length) number of bit-comparisons, although, if word-length is <= register length, the comparisons take place in parallel, with possibly a small delay for carry-propogation. (Actually, as I think about it, in the typical unequal case, either comparison can stop on the first unequal bit, and if the probability of equality is small enough, that could occur quite early.)

Mike Dunlavey
A: 

One aspect everyone is assuming is that he's talking about register level instructions. Everyone is right, it's basically moot on a CPU level of things. And even higher up most high level operations write inequality as a call to equality negated.

However, even higher up, using the questioner's optimization would work both ways. That is equality can be written as efficiently as inequality.

Additionally, to the point of people concerned with assembly ops, the only difference between a CMP and a SUB are which flags are set. They are usually executed with the same parts of the machine since CMP must return flags that represent equality, less than and greater than.

Adam Luter