I want parallelize a program. It's not that difficult with threads working on one big data-structure in shared memory. But I want to be able to use distribute it over cluster and I have to choose a technology to do that. MPI is one idea.
The question is what overhead will have MPI (or other technology) if I skip implementation of specialized version for shared memory and let MPI handle all cases ?
Update:
I want to grow a large data structure (game tree) simultaneously on many computers. Most parts of it will be only on one cluster node but some of it (unregular top of the tree) will be shared and synchronized from time to time.
On shared memory machine I would like to have this achieved through shared memory. Can this be done generically?