OpenMP would be an easy way to play around with multicore programming in C++. The wikipedia example doesn't really do anything processor intensive, but you could replace the 'cout' with some independent, long-running function.
OpenMP
As far as 64-bit, a lot of your performance increase is going to come from a few places.
Increased throughput, because all data elements are wider the processor can process more data in any given clock cycle. Take a look at some of the Microsoft benchmarks for Exchange Server, they have now moved to support 64-bit only because the throughput increases are incredible.
More registers, since the 64-bit architecture has a large number of registers most function parameters and the return value can be passed using registers.
In the x86 ABI with some calling conventions one or maybe two parameter could be passed via registers and the rest have to be pushed onto the stack. With a common calling convention like cdecl not a single parameter or return value is placed in a register. Since the stack is located in main memory this can be a big performance hit.