I'm trying to figure out what is the best practice while designing parallel algorithm for model of data distribution. What could be pros and cons against block distribution vs cyclic distribution of data in memory. Any help would be appreciated.
Block distribution in shared memory is best suited for algorithms that break their work up into chunks that require little (or no) synchronisation while the algorithm runs.
Parallel algorithm design should focus of minimising synchronisation bottlenecks between processes. An example would be a Gauss-Seidel relaxation method on a 2-D grid, where the grid is split into strips (1 per processor) and no synchronistion is performed. Each processor calculates an independent convergence value and terminates when that figure is reached.
You should also take into account data locality of reference, which can have a marked effect on parallel and sequential algorithms.
Quinn's "Parallel Programming in C with MPI and OpenMP" offers lots of examples of different ways to distribute data in parallel programming. There's even a decision-tree that helps you figure out which approach is most convenient, depending on your needs.