Supposing that memory is not an issue does targeting a 64 bit OS make a C/C++ Windows console application run faster?

Update: Prompted by a few comments/answers the application involves statistical algorithms (e.g., linear algebra, random number draws etc).

+3  A: 

The answer is maybe. You have to measure.

Using a 64 bits target allows the use of more registers, which implies less accesses to memory, and thus faster execution.

On the other hand, using a 64 bits target forces all pointers and addresses to be 64 bits, enlarging the memory footprint, and slowing the execution.

Didier Trosset
A 64bit OS gives access to more CPU registers?
Martin Beckett
@Martin: I think for x86-64 you have double the number of GPRs compared to vanilla x86. Can make quite a difference if the compiler is smart.
@Johannes there's more too it, on x86_64 fewer of them are reserved for other purposes as well, so it's actually slightly more than double, plus it does floating point with sse instructions rather than x87 which can be good (if you don't use much of things like sin/cos/sqrt/etc, on x87 the hardware can do them, but with sse it doesn't and moving values from one to the other is expensive) @Martin it's not the OS that does it, it's that you're running in 64bit mode so you can just see the new registers now
@spudd86: As per usual, its actually a little more complicated than that thanks modern processor advances. While it's true that the number of registers that can be named in the 32-bit x86 instruction set the number of registers implemented by the CPU has grown over time. Modern x86 cpus utilise these extra registers in conjunction with register renaming and out-of-order execution to minimise idle time while waiting for instruction and data fetches. This is probably still a gross simplification :)
@torak yea, but the same CPU running x86 or x86_64 code does pretty much the same stuff in terms of register renaming, OOOE, etc. and it doesn't really translate into more registers for your compiler to use, which you do get by going x86->x86_64
+1  A: 

The answer is a big Maybe.

Targeting a different platform will certainly have a performance impact on your application as you're making a substantial change to the application. It has different size semantics for types, operations and very different operating system to interact.

These factors and many others will certainly lead to a performance change in your application. Whether it is subtle, huge, better, worse, etc ... will be highly specific to the type of application you are writing. It's not possible to give a general answer here without more details.

not to mention poorly laid out data structures can (due to alignment constraints) inflate rather badly (eg `struct foo {int *p1; int i; int *p2; int j; };` would double in size despite the fact that the int's are still only 32 bits wide, changing it to be `struct foo {int *p1, *p2; int i,j; };` will make it as small as it can be)

Everything else being equal (which is unlikely) the extra data size of the 64bit (twice as much data to move around when dealing with pointers) would lead to expecting it to be slower.

But other factors (e.g. WOW overhead) could dominate things...

The only way would be to test your application on the hardware you are targeting.

You missed the fact that x86_64 has more registers and default floating point is sse rather than x87
The WoW64 overheard is minimal. General purpose code will run faster in WoW than in native x86-64, but do any math and default SSE will give you a very good speed improvement.

I know this isn't an answer, but can I please ask why "64-bit integer arithmetic is roughly four times faster in 64-bit mode, naturally."?

Is it just because they have more modern architecture and can use newer instruction sets, or does the larger word size make a difference?

Thanks :)

With a 32-bit instruction set and registers you're forced to do 64-bit arithmetic in multiple steps since you can't do it all at once (not with x86, at least). Why it's roughly 4 times, I can't say exactly but I'd say it needs four cycles instead of one to add two integers, then.
I see! 64-bit arithmetic is faster. But 32-bit arithmetic would be about the same, discounting the newer instruction sets. Thanks!

Possibly slower - you have just effectively halved the size of the CPU cache

Of course Intel and AMD's engineers know this so the memory manager does a lot of work to reduce the impact of 64bit wide pointers and integers where only the low 32bits are used

Martin Beckett
"halved the size of the CPU cache" - only for pointer heavy stuff int/float/double are all the same size on x86_64, (int is still only 32 bit), and anything that only works on old registers can still use the same opcodes so your codesize won't go up too much
unless on your architecture 'int' is now 64bit (as it was on the Alpha) or all opcodes are now 64bit wide
Martin Beckett
@spudd86, in x86_64 ints are 64-bit, perhaps you are confusing it with the int datatype in c/c++, or some other language, where the size is defined by the compiler and not the cpu.

If most of the execution time is spent doing math then you might get a benefit. In most cases this is not true. If you're doing monte carlo simulations of nuclear reactors or raytracing renders or something similar you're probably going to see a big benefit. My SWAG is "not much benefit"

+3  A: 

Replying mostly to the edit, not the original question: I've ported one application that's heavy on statistics and (especially) linear algebra to run as 64-bit code. For that code, the effort was minimal, and we got about a 3:1 improvement in speed.

I suspect that the majority of the notion that there often won't be comes (usually indirectly) from companies who have code that won't be easy to port, and are doing their best to tell customers why it's a good idea to continue buying their program, even though it's still 32-bit code. Of the code I've ported (or just "recompiled" in most cases) none has come out any slower as 64-bit code, and most has come out at least a little faster.

Jerry Coffin
Thanks. Good to know before I launch into any such effort to port the code.