tags:

views:

349

answers:

3

I need to MPI_Gatherv() a number of int/string pairs. Let's say each pair looks like this:

struct Pair {
  int x;
  unsigned s_len;
  char s[1]; // variable-length string of s_len chars
};

How to define an appropriate MPI datatype for Pair?

+2  A: 

Hi

I don't think you can do quite what you want with MPI. I'm a Fortran programmer, so bear with me if my understanding of C is a little shaky. You want, it seems, to pass a data structure consisting of 1 int and 1 string (which you pass by passing the location of the first character in the string) from one process to another ? I think that what you are going to have to do is pass a fixed length string -- which would have, therefore, to be as long as any of the strings you really want to pass. The reception area for the gathering of these strings will have to be large enough to to receive all the strings together with their lengths.

You'll probably want to declare a new MPI datatype for your structs; you can then gather these and, since the gathered data includes the length of the string, recover the useful parts of the string at the receiver.

I'm not certain about this, but I've never come across truly variable message lengths as you seem to want to use and it does sort feel un-MPI-like. But it may be something implemented in the latest version of MPI that I've just never stumbled across, though looking at the documentation on-line it doesn't seem so.

Regards

Mark

High Performance Mark
I hoped to avoid wasting space with fixed-length buffers. Another possible option wanted to avoid is representing array of len/chars pairs with 2 separate arrays: one of lens and one of chars. Thanks anyway.
Constantin
Mark, it's been a while since I played with MPI, but I'm fairly sure you're accurate here. At least for circa 2005 MPI.
Paul Nathan
+2  A: 

In short, it's theoretically impossible to send one message of variable size and receive it into a buffer of the perfect size. You'll either have to send a first message with the sizes of each string and then a second message with the strings themselves, or encode that metainfo into the payload and use a static receiving buffer.

If you must send only one message, then I'd forgo defining a datatype for Pair: instead, I'd create a datatype for the entire payload and dump all the data into one contiguous, untyped package. Then at the receiving end you could iterate over it, allocating the exact amount of space necessary for each string and filling it up. Let me whip up an ASCII diagram to illustrate. This would be your payload:

|..x1..|..s_len1..|....string1....|..x2..|..s_len2..|.string2.|..x3..|..s_len3..|.......string3.......|...

You send the whole thing as one unit (e.g. an array of MPI_BYTE), then the receiver would unpack it something like this:

while (buffer is not empty)
{
    read x;
    read s_len;
    allocate s_len characters;
    move s_len characters from buffer to allocated space;
}

Note however that this solution only works if the data representation of integers and chars is the same on the sending and receiving systems.

suszterpatt
Packing everything in contiguous buffer is what i finally settled on. One thing to note is that i had to use additional MPI_Gather() to collect payload sizes from each process. These payload sizes were used to calculate size of recv buffer and displacement vector (http://www.mpi-forum.org/docs/mpi-11-html/node70.html).
Constantin
+1  A: 

MPI implementations do not inspect or interpret the actual contents of a message. Provided that you know the size of the data structure, you can represent that size in some number of char's or int's. The MPI implementation will not know or care about the actual internal details of the data.

There are a few caveats...both the sender and receiver need to agree on the interpretation of the message contents, and the buffer that you provide on the sending and receiving side needs to fit into some definable number of char's or int's.

semiuseless