While i like the intellectual challenge coming from the design of multicore systems i realize that most of them were just unnecessary premature optimization.
But on the other hand usually all systems have some performance need and refactoring it later into multithreading safe operations is hard or even just economically not possible because it would be a complete rewrite with another algorithm.
What is your way to keep a balance between optimization and getting things done?