You could quickly get up and running with boost and ::boost::iterator_facade.
No it wouldn't be optimal or portable and iterator semantics are something you should hear Alexandrescou suddenly come out against at DevCon. You are not locking the container, you are locking (and potentially relocking and unlocking ) the operations. And locking the operation means serial execution, very simple. There is plenty of iterator manipulation that will be an unnecessary penalty for the abstraction being created.
From Mars view, iterator is hiding the pointer, and hiding under a semi-OO concept that is as odds as OO-vs-Distributed development is.. I'd use a 'procedural' interface for sure and make the users/maintainers pay attention to why it is necessary. Lock-free ops are only as good as 'all the parallel code' surrounding it. And classic examples as people keep giving scoped_lock wrapping reinvention since '96 credit, it produces pretty serial code.
Or use the atomic and Sutter's DDJ entries as reference for poor man way forward (and more than 10 years of unorderedness of Pentium Pro later).
(all that is really happening is that boost and DDJ is running after a .net and MS CCR train that is running after immutability, as well as intel train that is running after a good OO-similar abstraction for lockfree development. The problem is it cannot be done well and some people fight it time and time again; much like concurrent_vector nonsense of TBB. The same reason exceptions never materialised as non-problematic, especially across environments, and the same reason why vector-processing in CPUs is underutilised by C++ compilers and so on and on..)