the real problem with developing for mutlicore chips is nothing to do with the actual techniques of multi-tasking (isolated memory, task based etc), but more to do with the implementation of the threading model in the OS?
That's false. The OS works fine. The OS threading model works fine. Evidence: The OS works really well.
The "problem" with multi-threaded applications is the standard problem all software has:
Data Structures
Algorithms
When working in a threaded environment, you must actually design the updates and memory writes carefully to prevent race conditions that cause out-of-order writes to memory. And that's just hard. Most folks don't consider locking and mutable shared data structures carefully enough.
Read this: http://www.wilsonmar.com/1threads.htm. It's hard and some people get parts of it wrong. To me, the issue seems to be that a lot of people who are able to type code like to think they're capable of designing multi-threaded applications. Perhaps too many people are messing with multi-threaded applications when they shouldn't be.
Multi-threaded application design complexity is exacerbated by the fact that the Intel processor chip has no complete, formal, precise definition for memory write ordering.
Read this: http://moscova.inria.fr/~zappa/readings/cacm10.pdf
Very important stuff on what threading is made more complex by the Intel processor family.
And, optimizing compilers make this write ordering problem yet more complex.
If you do Functional Programming with immutable data structures, optimizations of memory writes may be simplified leading to better performance.
Most of the time, most programs can be broken into concurrent processes using ordinary OS pipelines. The rest of the time, a simple message queue to pass work through a pipeline-like structure gets to a high level of concurrency with no design complexity. Why? No shared mutable data structures.