From the Wikipedia article on Read-Copy-Update:
The reason that it is safe to run the removal phase concurrently with readers is the semantics of modern CPUs guarantee that readers will see either the old or the new version of the data structure rather than a partially updated reference.
Is this true for all modern CPUs (ARM, x86, PPC, etc.)? Is it likely to change in the future? It seems awfully nice to never need to pay the cost of doing a locked load so long as you don't mind possibly getting the old value again (this probably isn't an issue for many applications -- basically for any app that could use read-copy-update).