tags:

views:

423

answers:

5

I have the following string:

index                                       0   1   2   3   4   5   6   7
std::string myString with the content of "\xff\xff\xff\x00\xff\x0d\x0a\xf5"

When I'm refering to myString[3], I get the expected '\x00' value.

But when I'm referring to myString[5], I get two values "\x0d\x0a" instead of just '\x0d'.

Even more interesting is the myString[6] value, which is the '\xf5'. This time it's like the \x0d didn't exist and the correct position was referenced.

My question would be: what is so special about the \x0d character in a std:string object? How come it is skipped when indexing? It's like counting this way:

index                     0   1   2   3   4   5   5   6
std::string myString = "\xff\xff\xff\x00\xff\x0d\x0a\xf5"

As a comment, the '\x0d' character is the 13th ASCII character "carriage return" and '\x0a' is the line feed character.

UPDATE: Can it be that std::string considers "\x0d\x0a" as a single character and thus occupies only one position in the string? Is this '\x0d' a "mystery" character with regard to std::string?

ADDITIONAL INFO: http://en.wikipedia.org/wiki/Newline

+10  A: 

Are you sure this is happening with std::string? std::string::operator[] returns a const char &, so how can it be returning two chars ('\x0d' and '\x0a')?

That said, "\x0d\x0a" is usually used for line endings under Windows, whereas only '\x0a' is used under Linux, so conversion of the former to the latter is relatively common under Windows -- for example, I'm thinking of the behaviour of fopen when called with "wt". I would guess something similar is happening to you.

Edit: Based on your comments on the original question, I think I can guess what's going on.

I believe your string doesn't really contain what you think it contains. You're being misled because the mechanism you're using to output the string to a file (probably ofstream?) is performing end-of-line translation. This means that a '\n' (the Unix end-of-line code) is being translated to '\r\n' (the Windows end-of-line code). The purpose of end-of-line translation is to make code more portable between operating systems. You can inhibit it by opening the file in binary mode; for ofstream, this is done by specifying the ios_base::binary flag when you open the file, but this flag is not set by default.

(See this Wikipedia article for more information on end-of-line markers on different operating systems.)

This is what I believe is going on. Your string actually contains

index                 0   1   2   3   4   5   6
myString contents  "\xff\xff\xff\x00\xff\x0a\xf5"

You're outputting it something like this:

ofstream file("myfile.txt");
for(size_t i=0; i<myString.size(); i++)
    ofstream << myString[i];

Because of the end-of-line translation expalined above, the '\x0a' in myString[5] is being output as '\x0d\x0a', and that's what is confusing you.

Martin B
Indeed I'm using a similar mechanism: std::ostringstream. Also, if you can add more info about this translation for future reference I would be grateful. For example, who actually does it, and maybe a few links on the web.
cmdev
I've added a link to a comprehensive Wikipedia article. As to where the end-of-line translation is actually done, that probably depends on the standard library implementation -- but it probably happens in `ofstream` / `ostringstream` etc. or one of their base classes.
Martin B
A: 

You are probably misusing the [] operator.

The [] operator returns a const char. However you are probably using this as a pointer and thus getting two characters - we need to see your actual code to confirm this.

0x00 is a null-terminator for a c-string so that is probably why you are getting only one (correct) character for that.

What happens when you get [4]?

graham.reeds
A: 

In visual studio 2008, the \x00 is considered the end of string. So myString.lenght returns 3. When you try to access myString[5] you get an error.

+9  A: 

One thing that's going wrong here is the following line doesn't do what you expect:

std::string myString = "\xff\xff\xff\x00\xff\x0d\x0a\xf5";

This calls the std::string(const char *) constructor, which is designed to convert a C-style null-terminated string to a C++ std::string. This constructor reads bytes starting at the given pointer and copies them to the new std::string until it reaches a null byte (\x00). This is consistent with the behaviour of C functions such as strlen().

So, when your myString is constructed, it consists of a string of length 3, with bytes \xff, \xff, \xff. Access to indexes greater than 2 are accessing bytes off the end of the array (which will produce a runtime error at best, or undefined behaviour at worst).

Note that a std::string can hold intermediate null bytes, but you cannot use the above constructor to initialise such a string because the null byte is interpreted as terminating the C-style string passed to the constructor.

It would be worth trying your code again with the \x00 byte changed to something else, just to see how it differs from what you have already described:

std::string myString = "\xff\xff\xff\x01\xff\x0d\x0a\xf5"

Also, check myString.length() after the above constructor to see what you get.

Greg Hewgill
O.K. Meaning, the std::string::string(const char* ) ctor, iterates over C-Strings until '\0'. Makes sense why i was seeing the behavior on MSVC. My bad ...
Abhay
Good point -- I assumed that cmdev just wanted to show us the contents of the string and wasn't giving us the code that is actually used to initialize the string... but if this is the actual initialization, you've nailed the problem.
Martin B
That's correct. What I wanted to show was only the content of the string. The exact assignment is much more complex than this simple assignment. I will add a comment on it.
cmdev
While you're at it, can you show us the complete source code that you're using to access and output `myString[5]`? If the string isn't being read from disk, can you also show us the code you're using to initialize the string?
Martin B
@cmdev: Don't add a note. Paste a small, compilable program that shows the behavior you describe so that we can paste it into our editors and play with it. Right now the answer upvoted most addresses a point you said is moot. How do you expect to get answers fitting your question if we have to guess what the question is?
sbi
+2  A: 

You create string with following constructor: string(char const *)

It receives NUL terminated C string. So it finds its length according to the first 0 character.

You should use other constructor that specifies size: string(char const *,size_t n) by calling:

std::string myString("\xff\xff\xff\x00\xff\x0d\x0a\xf5",8);

See http://www.cplusplus.com/reference/string/string/string/ for further reading

Artyom