tags:

views:

93

answers:

4

I'm having a weird problem with the following function, which returns a string with all the characters in it after a certain point:

string after(int after, string word) {
    char temp[word.size() - after];
    cout << word.size() - after << endl; //output here is as expected
    for(int a = 0; a < (word.size() - after); a++) {
        cout << word[a + after]; //and so is this
        temp[a] = word[a + after];
        cout << temp[a]; //and this
    }
    cout << endl << temp << endl; //but output here does not always match what I want
    string returnString = temp;
    return returnString;
}

The thing is, when the returned string is 7 chars or less, it works just as expected. When the returned string is 8 chars or more, then it starts spewing nonsense at the end of the expected output. For example, the lines

cout << after(1, "12345678") << endl;
cout << after(1, "123456789") << endl;

gives an output of:

7
22334455667788
2345678
2345678
8
2233445566778899
23456789�,�D~
23456789�,�D~

What can I do to fix this error, and are there any default C++ functions that can do this for me?

+6  A: 

Use the std::string::substr library function.

std::string s = "12345678";
std::cout << s.substr (1) << '\n'; // => 2345678
s = "123456789";
std::cout << s.substr (1) << '\n'; // 23456789
Vijay Mathew
Thanks for telling me about that function! It fixed the problem! Edit: nvm, I see others' answers :)
wrongusername
+4  A: 

The behavior you're describing would be expected if you copy the characters into the string but forget to tack a null character at the end to terminate the string. Try adding a null character to the end after the loop, and make sure you allocate enough space (one more character) for the null character. Or, better, use the string constructor overload which accepts not just a char * but also a length.

Or, even better std::string::substr -- it will be easier and probably more efficient.

string after(int after, string word) { 
  return word.substr (after);
}

BTW, you don't need an after method, since exactly what you want already exists on the string class.

Now, to answer your specific question about why this only showed up on the 8th and later characters, it's important to understand how "C" strings work. A "C" string is a sequence of bytes which is terminated by a null (0) character. Library functions (like the string constructor you use to copy temp into a string instance which takes a char *) will start reading from the first character (temp[0]) and will keep reading until the end, where "the end" is the first null character, not the size of the memory allocation. For example, if temp is 6 characters long but you fill up all 6 characters, then a library function reading that string to "the end" will read the first 6 characters and then keep going (past the end of the allocated memory!) until it finds a null character or the program crashes (e.g. due to trying to access an invalid memory location).

Sometimes you may get lucky: if temp was 6 characters long and the first byte in memory after the end of your allocation happened to be a zero, then everything would work fine. If however the byte after the end of your allocation happened to be non-zero, then you'd see garbage characters. Although it's not random (often the same bytes will be there every time since they're filled by operations like previous method calls which are consistent from run to run of your program), but if you're accessing uninitialized memory there's no way of knowing what you'll find there. In a bounds checking environment (e.g. Java or C# or C++'s string class), an attempt to read beyond the bounds of an allocation will throw an exception. But "C" strings don't know where their end is, leaving them vulnerable to problems like the one you saw, or more nefarious problems like buffer overflows.

Finally, a logical follow-up question you'd probably ask: why exactly 8 bytes? Since you're trying to access memory that you didn't allocate and didn't initialize, whats in that RAM is what the previous user of that RAM left there. On 32-bit and 64-bit machines, memory is generally allocated in 4- or 8-byte chunks. So it's likely that the previous user of that memory location stored 8 bytes of zeroes there (e.g. one 64-bit integer zero) zeros there. But the next location in memory had something different left there by the previous user. Hence your garbage characters.

Moral of the story: when using "C" strings, be very careful about your null terminators and buffer lengths!

Justin Grant
Thank you! Could you please explain why it stopped working when it reached 8 chars in length?
wrongusername
+2  A: 

Your string temp is not NULL terminated. You requite temp[a] = '\0'; at the end of loop. Also you need to allocate word.size() - after + 1 chars so as to accomodate the NULL character.

Naveen
A: 

You're not null-terminating your char array. C-style strings (i.e., char arrays) need to have a null character (i.e., '\0') at the end so functions using them know when to stop.

I think this is basically your after() function, modulo some fudging of indexes:

string after(int after, string word) {
  return word.substring(after);
}
Jack
@Jack the function name is substr, not substring.
Vijay Mathew