I know that this is a nonsense when your problem is big enough, just don't care about the inefficiencies, but for small problems that can run in a small machine or a couple of machines, should I go through the pain of designing parallel algorithms when I have a Map/Reduce implementation already at hand?
This is a difficult problem to analyze. On the one hand, if the problem is too small then classical complexity analysis is liable to give the wrong answer due to lower order terms dominating for small N
.
On the other hand, complexity analysis where one of the variables is the number of compute nodes will also fail if the number of compute nodes is too small ... once again because of the overheads of the Map/Reduce infrastructure contribution to lower order terms.
So what can you do about it? Well, one approach would be to do a more detailed analysis that does not rely on complexity. Figure out the cost function(s), including the lower order terms and the constants, for your particular implementation of the algorithms and the map/reduce framework. Then substitute values for the problem size variables, the number of nodes etc. Complicated, though you may be able to get by with estimates for certain parts of the cost function.
The second approach is to "suck it and see".