views:

433

answers:

5

I'm doing some maintenance work and ran across something like the following:

std::string s;
s.resize( strLength );  
// strLength is a size_t with the length of a C string in it. 

memcpy( &s[0], str, strLength );

I know using &s[0] would be safe if it was a std::vector, but is this a safe use of std::string?

+13  A: 

A std::string's allocation is not guaranteed to be contiguous under the current standard, but the new standard forces it to be. In practice, neither I nor Herb Sutter know of an implementation that does not use contiguous storage.

EDITED TO ADD: Following the comments below, it is guaranteed by the current standard if strLength > 0, otherwise invokes undefined behavior. It would not be guaranteed if you did str.begin() or &*str.begin(), but for &s[0] the standard defines operator[] as:

Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const version returns charT(). Otherwise, the behavior is undefined.

This is where the undefined behavior would occur if strLength == 0.

Continuing on, data() is defined as:

Returns: ... a pointer to the initial element of an array whose first size() elements equal the corresponding elements of the string controlled by *this ...

Since you resize, you should have a contiguous array of at least strLength. Note that the memcpy is not copying a null terminator, but if it did would have not been guaranteed to work.

Todd Gardner
I have not been following the standard for the last few months, but it was my impression was this was still in the 0x draft, and therefor not actually yet required (or will be if a library chooses to only implemented '03).
Todd Gardner
James McNellis
I think that might be right; the std defect 530 says operator[] is contiguous but the iterator interface is not guaranteed to be, and quotes 23.4.4. I am digging out my standard to check.
Todd Gardner
I skipped right over the defect link in Sutter's post, that's why I missed it. In any case, the defect says "we almost require contiguity already," (key word: almost) and I don't see how its reference to multiset is relevant (basic_string is a sequence with random access iterators). However, I think the important thing to take away is that "given the existence of data(), and the definition of operator[] and at in terms of data, I don't believe it's possible to write a useful and standard- conforming basic_string that isn't contiguous."
James McNellis
Todd, did you mean to say that you (and Sutter) *do not* know of an implementation? If you *do* know of an implementation, could you name it for us?
Rob Kennedy
@Rob - fixed, thanks. @James - Yeah, I think I understand where they are going with it (though I do not understand the 23.4.4 ref). (strike a part about interesting edge cases, I was wrong about that)
Todd Gardner
Roger Pate
@Roger: Thank you for clearing that up!
James McNellis
+5  A: 

Technically, no, since std::string is not required to store its contents contiguously in memory.

However, in almost all implementations (every implementation of which I am aware), the contents are stored contiguously and this would "work."

James McNellis
Can you identify some implementations where it wouldn't work?
Rob Kennedy
Nope. But you could make such an implementation if you wanted.
James McNellis
@Neil: Do you have a link/reference to that TC?
James McNellis
Aargh - sorry, brain going - I'm thinking of vector, not string. Apologies all round.
anon
James McNellis
+1  A: 

Legal? Maybe, maybe not. Safe? Probably, but maybe not. Good code? Well, let's not go there...

Why not just do:

std::string s = str;

...or:

std::string s(str);

...or:

std::string s;
std::copy( &str[0], &str[strLen], std::back_inserter(s));

...or:

std::string s;
s.assign( str, strLen );

?

John Dibling
or s.assign( str, strLen );
anon
good, updated w/assign
John Dibling
`std::string s (str, strLen);` (Shortest form identical, in case of embedded nulls or lacking null termination, to the original behavior from the question.)
Roger Pate
+2  A: 

This is generally not safe, regardless of whether the internal string sequence is stored in memory continuously or not. There's might be many other implementation details related to how the controlled sequence is stored by std::string object, besides the continuity.

A real practical problem with that might be the following. The controlled sequence of std::string is not required to be stored as a zero-terminated string. However, in practice, many (most?) implementations choose to oversize the internal buffer by 1 and store the sequence as a zero-terminated string anyway because it simplifies the implementation of c_str() method: just return a pointer to the internal buffer and you are done.

The code you quoted in your question does not make any effort to zero-terminate the data it copied into the internal buffer. Quite possibly it simply doesn't know whether zero-termination is necessary in this implementation of std::string. Quite possibly it relies on the internal buffer being filled with zeros after the call to resize, so the extra character allocated for the zero-terminator by the implementation is conveniently pre-set to zero. All this is implementation detail, meaning that this technique depends of some rather fragile assumptions.

In other words, in some implementations you'd probably have to use strcpy, not memcpy to force the data into the controlled sequence like that. While in some other implementations you'd have to use memcpy and not strcpy.

AndreyT
After the call to `resize` you can be quite sure that the internal string is or isn't null-terminated as the implementation requires. After a call to `resize` after all you must have a valid string of n characters (padded with zero characters as needed). - However, it shows a lack of understanding for the `std::string` class: memcpy is used either out of ignorance or as a misguided attempt for performance (because of the `resize` call the code ends up assigning values to the buffer twice).
UncleBens
@UncleBens: I don't understand your first sentence. In any case, yes, the language standard guarantees that the size-increasing `resize` call pads the string with zeros. However, the standard guarantees the padding only up to the requested size (`strLength` in this case), but there's no guarantee in the standard for that extra character, if the implementation allocates one.
AndreyT
A: 

The code might work, but more by luck than judgement, it makes assumptions about the implementation that are not guaranteed. I suggest determining the validity of the code is irrelevant while it is a pointless over complication that is easily reduced to just:

std::string s( str ) ;

or if assigning to an existing std::string object, just:

s = str ;

and then let std::string itself determine how to achieve the result. If you are going to resort to this sort of nonsense, then you may as well not be using std::string and stick to since you are reintroducing all the dangers associated with C strings.

Clifford
I actually can't be sure the string being assigned is null terminated. So the best I could do will probably be s.assign( ptr, ptrLength); which is still an improvement I think.
ceretullis
Use the constructor form: `std::string s (str, strLen);`
GMan