Well, I certainly agree, it is pretty horrible that such an implementation detail is exposed. It is however the exact same kind of detail that's exposed by the lock keyword. We are still very far removed from that bug generator to be completely removed from our code.
The hardware guys have a lot of work to do. The volatile keyword matters a lot of CPU cores with a weak memory model. The marketplace isn't been kind to them, the Alpha and the Itanium haven't done well. Not exactly sure why, but I suspect that the difficulty of writing solid threaded code for these cores has a lot to do with it. Getting it wrong is quite a nightmare to debug. The verbiage in the MSDN Library documentation for volatile applies to these kind of processors, it otherwise is quite inappropriate for x86/x64 cores and makes it sound that the keyword is doing far more than it really does. Volatile merely prevents variable values from being stored in CPU registers on those cores.
Unfortunately, volatile still matters on x86 cores in very select circumstances. I haven't yet found any evidence that it matters on x64 cores. As far as I can tell, and backed up by the source code in SSCLI20, the Opcodes.Volatile instruction is a no-op for the x64 jitter, changing neither the compiler state nor emitting any machine code. That's heading the right way.
Generic advice is that wherever you're contemplating volatile, using lock or one of the synchronization classes should be your first consideration. Avoiding them to try to optimize your code is a micro-optimization, defeated by the amount of sleep you'll lose when your program is exhibiting thread race problems.