tags:

views:

180

answers:

5

For my application, I need to be able to send an std::vector<std::string> over a UNIX socket(local), and get a copy of the vector on the other end of the socket. What's the easiest way to do this with O(1) messages relative to the size of the vector(i.e. without sending a message for each string in the vector)?

Since this is all on the same host, and because I control both ends of the socket, I'm not concerned with machine-specific issues such as endinness or vector/string representation.

I'd like to avoid using any external libraries for a variety of reasons.

+2  A: 

Packing data structures for transmission and reception is usually called serialization.

One option you could use: The Boost serialization library has a capability of serializing STL vectors.

Another would be to roll your own - shouldn't be difficult in this case. You could, for example, concatenate all the strings of the vector together into a single string (with each constituent NULL separated) and send that buffer, then restoring it similarly.

Eli Bendersky
Can you give an example(in code) as to how one would pack and unpack the buffer of strings?
Mike
@Mike: I believe Eli's answer contains a perfectly reasonable way to approach this problem that doesn't rely on external libraries.
Billy ONeal
@Billy: to be fair, Mike commented *while* I was adding that paragraph
Eli Bendersky
@Eli: Ah -- no edit showed up :P
Billy ONeal
+1  A: 

I'm sure I will get yelled at by C++ zealots for this, but try writev(2) (a.k.a. scatter/gather I/O). You would have to deal with zero separators on the receiving side anyway though.

Nikolai N Fetissov
* This C++ Zealot not yelling. UNIX is a C system, and use of legacy C APIs is entirely fine.
Billy ONeal
If the format is {number of elements, {length of element… }, {data of element… } } then `readv` can distribute the data efficiently.
Potatoswatter
No, never do this. Never ever assume anything about the input at this level.
Nikolai N Fetissov
+6  A: 

std::string does not prevent you from having nuls inside your string. It's only when you try to use these with nul sensitive APIs that you run into trouble. I suspect you would have serialize the array by prepending the size of the array and then the the length of each string on the wire.

...
long length = htonl( vec.size() );
write( socket, &length, sizeof(length) );
for ( int i = 0; i < vec.size(); ++i ) {
    length = htonl( vec[i].length() );
    write( socket, &length, sizeof(length) );
    write( socket, vec[i].data(), vec[i].length() );
}
...

Unpacking is done similarly:

...
std::vector vectorRead;
long size = 0;
read( socket, &size, sizeof( size ) );
size = ntohl( size );
for ( int i = 0; i < size; ++i ) {
    std::string stringRead;
    long length = 0;
    read( socket, &length, sizeof( length ) );
    length = ntohl( length );
    while ( 0 < length ) {
        char buffer[1024];
        int cread;
        cread = read( socket, buffer, min( sizeof( buffer ), length ) );
        stringRead.append( buffer, cread );
        length -= cread;
    }
    vectorRead.push_back( stringRead );
}
...
David Smith
I can guarantee that no nulls will be inside the string.
Mike
Prepending the lengths has the added benefit of telling you how much data to read from the socket. It is easier to read the data you need than to search for the nul delimiters in the string. You might get the best of both worlds by forming all the strings into one and then writing the length of that string to the socket so the remote side knows how many bytes to read and then breaking it down after reading the blob.
David Smith
I really hope this is not real code but just an algorithm. You really really *really* want to check for errors after each system call here.
Nikolai N Fetissov
Thanks for the reminder Nikolai, error checking is left as an exercise for the reader. ;)
David Smith
A: 

There is no way to send vector via a socket, even on the same machine (or even in the same process for that matter). There are two issues with this:

  1. vector and string both maintain internal pointers to raw memory. This precludes sending the vector<,string> to another process
  2. The dtors of the vector and string will want to delete that pointer. socket operations will do a memcpy of of your object (including the values of the raw pointers) and you will get a double deletion.

So the rule is this : in order to send an objects via a socket it must be able to be memcpy'd. There are several ways to do this

  1. Serialize the vector Things like ICE are good at generating these serializations http://www.zeroc.com/ These have the obvious overhead
  2. Create something with the same interface as vector and string, but is capable of being memcpy'd
  3. Create read-only versions of something that looks like vector The send side can be regular vector the recv side can reinterpret_cast the recv buffer as the read only implementation

Number 2 is very difficult to do in general, but with certain limitations is possible. For high performance apps, you arent going to be using vector in any case.

Number 3 applies to vritually all the use cases out there, in that reader rarely modifies the contents of the recv buffer. If the reader does not need random access iterators, and can live with ForwardIterators, the serialization is pretty easy: alloc one buffer that can hold all the strings, plus and integer for each denoting the length plus one int for the size of the vector.

The result can be reinterpret_cast'd to a user defined structure that is a read only collection of read only strings. So without too much trouble you can at least get O(1) on the read side.

To get O(1) on the send side, you would have to go with method 2. I've done this, knowing that my app will never use more than strings of X length, and that the vector will never hold more than Y items. The trick is that fixing the capacity I'll never have to go to the heap for memory. The downside is that you are sending the entire capacity of each string, and not just what was used. However in many cases just sending everything is far faster that trying to compact it, esp if you are on the same machine -- in this case you could just place this structure in shared memory and notify the recv app to just look for it.

You may want to look at boost interprocess for more ideas on how to make containers that can be shoved through sockets without serialization.

Lance Diduck
A: 

The solution I ended up taking was serializing the vector of strings in the form <string1>\0<string2>\0...<stringN>\0 (sending the length of the aforementioned string beforehand). While David correctly points out that this will not work for cases where std::string contains a null, I can guarantee this will not be the case for my application.

Mike