The fact that you don't see any problem on a uniprocessor machine, but see problems on a multiproc machine is an artifact of the relatively large granularity of thread context switching on a uniprocessor machine. A thread will execute for N amount of time (milliseconds, nanoseconds, whatever) before the thread scheduler switches execution to a different thread. A lot of CPU instructions can execute in the typical thread timeslice. You can think of it as having a fairly large chunk of "free play" exclusive processor time during which you probably won't run into resource collisions because nothing else is executing on the processor.
When running on a multiproc machine, though, CPU instructions in two threads execute exactly at the same time. The size of the "free play" chunk of time is near zero.
To reproduce a resource contention issue between two threads, you need to get thread 1 to be accessing the resource and thread 2 to be accessing the resource at the same time, or very nearly the same time.
In the large-granularity thread switching that takes place on a uniprocessor machine, the chances that a thread switch will happen exactly in the right spot are slim, so the program may never exhibit a failure under normal use on a uniproc machine.
In a multiproc machine, the instructions are executing at the same time in the two threads, so the chances of thread 1 and thread 2 accessing the same resource at the same time are much, much greater - thousands of times more likely than the uniprocessor scenario.
I've seen it happen many times: an app that has been running fine for years on uniproc machines suddenly starts failing all over the place when executed on a new multiproc machine. The cause is a latent threading bug in the original code that simply never hit the right coincidence of timeslicing to repro on the uniproc machines.
When working with multithreaded code, it is absolutely imperitive to test the code on multiproc hardware. If you have thread collision issues in your code, they will quickly present themselves on a multiproc machine.
As others have noted, don't use SuspendThread() unless you are a debugger. Use mutexes or other synchronization objects to coordinate between threads.