views:

107

answers:

1

My application of MPI has some process that generate some large data. Say we have N+1 process (one for master control, others are workers), each of worker processes generate large data, which is now simply write to normal file, named file1, file2, ..., fileN. The size of each file may be quite different. Now I need to send all fileM to rank M process to do the next job, So it's just like all to all data transfer.

My problem is how should I use MPI API to send these files efficiently? I used to use windows share folder to transfer these before, but I think it's not a good idea.

I have think about MPI_file and MPI_All_to_all, but these functions seems not to be so suitable for my case. Simple MPI_Send and MPI_Recv seems hard to be used because every process need to transfer large data, and I don't want to use distributed file system for now.

+1  A: 

It's not possible to answer your question precisely without a lot more data, data that only you have right now. So here are some generalities, you'll have to think about them and see if and how to apply them in your situation.

  • If your processes are generating large data sets they are unlikely to be doing so instantaneously. Instead of thinking about waiting until the whole data set is created, you might want to think about transferring it chunk by chunk.
  • I don't think that MPI_Send and _Recv (or the variations on them) are hard to use for large amounts of data. But you need to give some thought to finding the right amount to transfer in each communication between processes. With MPI it is not a simple case of there being a message startup time plus a message transfer rate which apply to all messages sent. Some IBM implementations, for example, on some of their hardware had different latencies and bandwidths for small and large messages. However, you have to figure out for yourself what the tradeoffs between bandwidth and latency are for your platform. The only general advice I would give here is to parameterise the message sizes and experiment until you maximise the ratio of computation to communication.
  • As an aside, one of the tests you should already have done is measured message transfer rates for a wide range of sizes and communications patterns on your platform. That's kind of a basic shake-down test when you start work on a new system. If you don't have anything more suitable, the STREAMS benchmark will help you get started.
  • I think that a all-to-all transfers of large amounts of data is an unusual scenario in the kinds of programs for which MPI is typically used. You may want to give some serious thought to redesigning your application to avoid such transfers. Of course, only you know if that is feasible or worthwhile. From what little information your provide it seems as if you might be implementing some kind of pipeline; in such cases the usual pattern of communication is from process 0 to process 1, process 1 to process 2, 2 to 3, etc.
  • Finally, if you happen to be working on a computer with shared memory (such as a multicore PC) you might think about using a shared memory approach, such as OpenMP, to avoid passing large amounts of data around.
High Performance Mark