tags:

views:

441

answers:

5

I spent about 4 hours yesterday trying to fix this issue in my code. I simplified the problem to the example bellow.

The idea is to store a string in a stringstream ending with std::ends, then retrieve it later and compare it to the original string.

#include <sstream>
#include <iostream>
#include <string>

int main( int argc, char** argv )
{
    const std::string HELLO( "hello" );

    std::stringstream testStream;

    testStream << HELLO << std::ends;

    std::string hi = testStream.str();

    if( HELLO == hi )
    {
        std::cout << HELLO << "==" << hi << std::endl;
    }

    return 0;
}

As you can probably guess, the above code when executed will not print anything out.

Although, if printed out, or looked at in the debugger (VS2005), HELLO and hi look identical, their .length() in fact differs by 1. That's what I am guessing is causing the "==" operator to fail.

My question is why. I do not understand why std::ends is an invisible character added to string hi, making hi and HELLO different lengths even though they have identical content. Moreover, this invisible character does not get trimmed with boost trim. However, if you use strcmp to compare .c_str() of the two strings, the comparison works correctly.

The reason I used std::ends in the first place is because I've had issues in the past with stringstream retaining garbage data at the end of the stream. std::ends solved that for me.

+7  A: 

std::ends inserts a null character into the stream. Getting the content as a std::string will retain that null character and create a string with that null character at the respective positions.

So indeed a std::string can contain embedded null characters. The following std::string contents are different:

ABC
ABC\0

A binary zero is not whitespace. But it's also not printable, so you won't see it (unless your terminal displays it specially).

Comparing using strcmp will interpret the content of a std::string as a C string when you pass .c_str(). It will say

Hmm, characters before the first \0 (terminating null character) are ABC, so i take it the string is ABC

And thus, it will not see any difference between the two above. You are probably having this issue:

std::stringstream s;
s << "hello";
s.seekp(0);
s << "b";
assert(s.str() == "b"); // will fail!

The assert will fail, because the sequence that the stringstream uses is still the old one that contains "hello". What you did is just overwriting the first character. You want to do this:

std::stringstream s;
s << "hello";
s.str(""); // reset the sequence
s << "b";
assert(s.str() == "b"); // will succeed!

Also read this answer: How to reuse an ostringstream

Johannes Schaub - litb
+2  A: 

std::ends is simply a null character. Traditionally, strings in C and C++ are terminated with a null (ascii 0) character, however it turns out that std::string doesn't really require this thing. Anyway to step through your code point by point we see a few interesting things going on:

int main( int argc, char** argv )
{

The string literal "hello" is a traditional zero terminated string constant. We copy that whole into the std::string HELLO.

    const std::string HELLO( "hello" );

    std::stringstream testStream;

We now put the string HELLO (including the trailing 0) into the stream, followed by a second null which is put there by the call to std::ends.

    testStream << HELLO << std::ends;

We extract out a copy of the stuff we put into the stream (the literal string "hello", plus the two null terminators).

    std::string hi = testStream.str();

We then compare the two strings using the operator == on the std::string class. This operator (probably) compares the length of the string objects - including how ever many trailing null characters. Note that the std::string class does not require the underlying character array to end with a trailing null - put another way it allows the string to contain null characters so the first of the two trailing null characters is treated as part of the string hi.

Since the two strings are different in the number of trailing nulls, the comparison fails.

    if( HELLO == hi )
    {
        std::cout << HELLO << "==" << hi << std::endl;
    }

    return 0;
}

Although, if printed out, or looked at in the debugger (VS2005), HELLO and hi look identical, their .length() in fact differs by 1. That's what I am guessing is causing the "==" operator to fail.

Reason being, the length is different by one trailing null character.

My question is why. I do not understand why std::ends is an invisible character added to string hi, making hi and HELLO different lengths even though they have identical content. Moreover, this invisible character does not get trimmed with boost trim. However, if you use strcmp to compare .c_str() of the two strings, the comparison works correctly.

strcmp is different from std::string - it is written from back in the early days when strings were terminated with a null - so when it gets to the first trailing null in hi it stops looking.

The reason I used std::ends in the first place is because I've had issues in the past with stringstream retaining garbage data at the end of the stream. std::ends solved that for me.

Sometimes it is a good idea to understand the underlying representation.

1800 INFORMATION
A: 

You're adding a NULL char to HELLO with std::ends. When you initialize hi with str() you are removing the NULL char. The strings are different. strcmp doesn't compare std::strings, it compares char* (it's a C function).

20th Century Boy
Gotta love StackOverflow - in the time it took me to write a 2 line answer someone has written War and Peace :-)
20th Century Boy
And it appears I was wrong anyway about how str() works. Back to the drawing board for me!
20th Century Boy
http://steve-yegge.blogspot.com/2008/09/programmings-dirtiest-little-secret.html;)
1800 INFORMATION
That's a great blog! Thanks for the link! (even though it was mainly posted for 20th Century Boy) :p
DeadHead
A: 

std::ends adds a null terminator, (char)'\0'. You'd use it with the deprecated strstream classes, to add the null terminator.

You don't need it with stringstream, and in fact it screws things up, because the null terminator isn't "the special null terminator that ends a string" to stringstream, to stringstream it's just another character, the zeroth character. stringstream just adds it, and that increases the character count (in your case) to seven, and makes the comparison to "hello" fail.

tpdi
You don't need it with `strstream` either. `string::c_str()` always is properly NUL-terminated regardless of how the string was built.
Ben Voigt
A: 

I think to have a good way to compare strings is to use std::find method. Do not mix C methods and std::string ones !!! (just an advice).

Thez