tags:

views:

138

answers:

3

Hi

Which operation i.e a 32 bit operation or a 64 bit operation (like masking a 32 bit flag or a 64 bit flag), would be cheaper on a 64 bit machine?

+2  A: 

Generally speaking a 64 bit operation or a 32 bit operation would have the same cost. The 32-bit operation might end up taking an extra instruction depending on if the compiler needed to ensure that the upper 32-bits of a 64-bit register was cleared (or sign-extended), but that operation generally has little cost.

There might be some difference in instruction encoding that might make one take more space than the other, but that (and which way the advantage would lie) would depend on a number of factors.

Michael Burr
With the caveat, of course, that the 64 bit operation would be operating on twice as much data, right?
tloflin
@tloflin - sure, but I guess I was thinking that the question was asking about operations that would be equivalent, such as `unsigned long long x |= 0x10ULL` vs. `unsigned long x |= 0x10UL`. In general I think it's something you shouldn't worry about performance-wise, you should worry about it data-requirements-wise. Unless and until performance is known to be an issue of some sort.
Michael Burr
+1  A: 

It depends -- masking a flag will normally use an AND instruction, which will execute quickly (~1 cycle) once the data is in a register. Loading 64 bits of data from memory will generally be slower than loading 32 bits of data -- but if you're using more than 32 flags, you'll have to load more than 32 bits of data anyway, and handling the masking in one cycle will improve speed over doing it in two or three instructions. Whether any of this makes a difference to overall speed will generally depend on surrounding instructions -- for example, if the data is already in the cache anyway, you may not need to load it from memory.

In other words, it's difficult to make generalizations -- you just about have to look at a specific code sequence (not just one instruction, but a whole sequence) to say anything -- and the result for that sequence may not mean much about another sequence that initially looks almost identical.

Jerry Coffin
+2  A: 

As you don;t specify an architecture, I can suggest only a general answer, as it depends on the operation and on the processor architecture in question. Once you have the data in a CPU register, then most operations will usually take the same amount of time regardless of whether the value was originally 32 or 64 bit.

However, there can be some differences on some architectures in how the data gets into a register. Here are some situations where a "native" value may be faster than a smaller value on some hardware:

Fetching data

  • Fetching a "native sized" value may be faster than fetching a smaller value. That is, the processor may need to fetch 64 bits regardless, and then mask/shift off 32 bits of it to "load" a 32-bit value. This masking/shifting is not required when working on a 64 bit value, so it can possibly be loaded faster. (This goes against the intuitive idea that something twice as big might take twice as long to load).

  • Alternatively, if the bus can handle half-width fetches, then 32 bits may be loaded in the same time as a 64 bit value.

  • To confuse matters more, the CPU caches can change results as well. Usually when you read one value from memory, a "line" of several memory locations are read into the cache, so that subsequent reads can be supplied from fast cache memory instead of requiring a full fetch from RAM. In which case using 32 bit values will work out faster if you are accessing many values in sequence, as twice as many of them will be cached, resulting in fewer cache misses.

Computation

  • the processor hardware is optimised for dealing with 64-bit values, so calculating values using 32 bits may cause it more trouble, and thus could slow things down. e.g. It might be able to process a double (64-bit) value "natively" but have to convert a float (32-bit) value into a double before it can process it, then convert the result back to a float afterwards.

  • Alternatively, there may be 32-bit and 64-bit paths through the CPU, or the CPU may be able to do any conversions required in a way that does not affect the overall execution time of the instruction, in which case they may be calculated at the same speed.

  • This may affect complex operations (floating point) but is unlikely to be a problem with simple ops (AND, OR, etc)

Jason Williams

related questions