views:

420

answers:

5

I suppose I am focussing on x86, but I am generally interested in the move from 32 to 64 bit.

Logically, I can see that constants and pointers, in some cases, will be larger so programs are likely to be larger. And the desire to allocate memory on word boundaries for efficiency would mean more white-space between allocations.

I have also heard that 32 bit mode on the x86 has to flush its cache when context switching due to possible overlapping 4G address spaces.

So, what are the real benefits of 64 bit?

And as a supplementary question, would 128 bit be even better?

Edit:

I have just written my first 32/64 bit program. It makes linked lists/trees of 16 byte (32b version) or 32 byte (64b version) objects and does a lot of printing to stderr - not a really useful program, and not something typical, but it is my first.

Size: 81128(32b) v 83672(64b) - so not much difference

Speed: 17s(32b) v 24s(64b) - running on 32 bit OS (OS-X 10.5.8)

+4  A: 

Regardless of the benefits, I would suggest that you always compile your program for the system's default word size (32-bit or 64-bit), since if you compile a library as a 32-bit binary and provide it on a 64-bit system, you will force anyone who wants to link with your library to provide their library (and any other library dependencies) as a 32-bit binary, when the 64-bit version is the default available. This can be quite a nuisance for everyone. When in doubt, provide both versions of your library.

As to the practical benefits of 64-bit... the most obvious is that you get a bigger address space, so if mmap a file, you can address more of it at once (and load larger files into memory). Another benefit is that, assuming the compiler does a good job of optimizing, many of your arithmetic operations can be parallelized (for example, placing two pairs of 32-bit numbers in two registers and performing two adds in single add operation), and big number computations will run more quickly. That said, the whole 64-bit vs 32-bit thing won't help you with asymptotic complexity at all, so if you are looking to optimize your code, you should probably be looking at the algorithms rather than the constant factors like this.

EDIT:
Please disregard my statement about the parallelized addition. This is not performed by an ordinary add statement... I was confusing that with some of the vectorized/SSE instructions. A more accurate benefit, aside from the larger address space, is that there are more general purpose registers, which means more local variables can be maintained in the CPU register file, which is much faster to access, than if you place the variables in the program stack (which usually means going out to the L1 cache).

Michael Aaron Safyan
> "for example, placing two pairs of 32-bit numbers in two registers and performing two adds in single add operation"Is there any compiler out there doing this?Also, is seems the same could be done on x86 using SSE instructions.
Suma
Thinking about such "two adds in one" more, it is a nonsense and no compiler can do it as an optimization, because addition from lower 32b could overflow into higher 32b. You need SIMD instructions for this.
Suma
I guess if you were keen you could do multiple 16 bit arithmetic in 64 bit registers. Would seem to be messy, but I bet it has been done.
philcolbourn
'Constant Factors' - sound's like something Brian Harvey would say.
philcolbourn
@Suma, sorry. You're right.
Michael Aaron Safyan
A: 

More data is transferred between the CPU and RAM for each memory fetch (64 bits instead of 32), so 64-bit programs can be faster provided they are written so that they properly take advantage of this.

codebolt
Actually, this isn't so: the memory bus is whatever width, which has nothing essential to do with the width of the processor's registers. Some 32 bit systems fetch 128 bits at a time, there are 64 bit systems that fetch 32 at a time, and even 32 bit systems that fetch memory no more than 8 bits at a time.
Andrew McGregor
OK, I wasn't aware of that- still, isn't it correct that a single mov instruction transfers 64 bits on a 64 bit cpu and 32 bits on a 32 bit cpu? So when copying a large amount of memory from point A to point B, this would at least mean fewer mov instructions would need to be exectured on a 64-bit CPU (even if the memory bus is the bottleneck)?
codebolt
When moving large amount of memory, you will use 128b SIMD instructions on both x86 and x64.
Suma
+1  A: 

Unless you need to access more memory that 32b addressing will allow you, the benefits will be small, if any.

When running on 64b CPU, you get the same memory interface no matter if you are running 32b or 64b code (you are using the same cache and same BUS).

While x64 architecture has a few more registers which allows easier optimizations, this is often counteracted by the fact pointers are now larger and using any structures with pointers results in a higher memory traffic. I would estimate the increase in the overall memory usage for a 64b application compared to a 32b one to be around 15-30 %.

Suma
A: 

In the specific case of x68 to x68_64, the 64 bit program will be about the same size, if not slightly smaller, use a bit more memory, and run faster. Mostly this is because x86_64 doesn't just have 64 bit registers, it also has twice as many. x86 does not have enough registers to make compiled languages as efficient as they could be, so x86 code spends a lot of instructions and memory bandwidth shifting data back and forth between registers and memory. x86_64 has much less of that, and so it takes a little less space and runs faster. Floating point and bit-twiddling vector instructions are also much more efficient in x86_64.

In general, though, 64 bit code is not necessarily any faster, and is usually larger, both for code and memory usage at runtime.

Andrew McGregor
+4  A: 

I typically see a 30% speed improvement for compute-intensive code on x86-64 compared to x86. This is most likely due to the fact that we have 16 x 64 bit general purpose registers and 16 x SSE registers instead of 8 x 32 bit general purpose registers and 8 x SSE registers. This is with the Intel ICC compiler (11.1) on an x86-64 Linux - results with other compilers (e.g. gcc), or with other operating systems (e.g. Windows), may be different of course.

Paul R
By 'compute intensive' do you mean graphics, matrix, DFTs?
philcolbourn
@phil: yes, mainly image processing, mostly integer (fixed point), lots of SIMD code, etc.
Paul R