I have a couple questions about Sutter's concurrency series on Dr. Dobbs.
1) Sutter recommends moving code not requiring locks out of critical sections to (apart from increase the granularity of the critical section) keep from nesting critical sections, in the event that the nested calls themselves enter critical sections. I have always done this but as he points out in "Use Critical Sections (Preferably Locks) to Eliminate Races" the compiler can move code INTO a critical section. Is this then a reliable way to enforce lock hierarchies and keep from nesting critical sections (which can lead to deadblocks)?
2) In "Maximize Locality, Minimize Contention", Sutter gives examples and makes recommendations about how to organize data to increase concurrency using knowledge of the hardware (e.g. cache size and exclusivity to a core or set of cores, etc.). One suggestion he makes is to add a 'dummy' member to a class separating two contentious members to force them to go into separate cache lines on separate cores, allowing concurrent work on the members. How the heck do I do this? Given the power that the compiler, processor and cache have over my code, how do I know if this hardware level design works? How do I convince someone looking at my code that it works? It seems like black magic to me. So one of two things is true:(1) this article is blue sky software development, or (2) I am missing out on some very cool knowledge and tricks. I suspect the latter. Can someone please tell me how/where to get started correctly designing concurrent code given hardware knowledge or at least point me somewhere to start learning?
Many thanks