tags:

views:

215

answers:

3

I've been working on a senior project for the last several months now, and a major sticking point in our team's development process has been dealing wtih rifts between Visual-C++ and gcc. (Yes, I know we all should have had the same development environment.) Things are about finished up at this point, but I ran into a moderate bug just today that had me wondering whether Visual-C++ is easier on newbies (like me) by design.

In one of my headers, there is a function that relies on strtok to chop up a string, do some comparisons and return a string with a similar format. It works a little something like the following:

int main()  
{  
    string a, b, c;  
    //Do stuff with a and b.  
    c = get_string(a,b);  
}   

string get_string(string a, string b) { const char * a_ch, b_ch; a_ch = strtok(a.c_str(),","); b_ch = strtok(b.c_str(),","); }

strtok is infamous for being great at tokenizing, but equally great at destroying the original string to be tokenized. Thus, when I compiled this with gcc and tried to do anything with a or b, I got unexpected behavior, since the separator used was completely removed in the string. Here's an example in case I'm unclear; if I set a = "Jim,Bob,Mary" and b="Grace,Soo,Hyun", they would be defined as a="JimBobMary" and b="GraceSooHyun" instead of staying the same like I wanted.

However, when I compiled this under Visual C++, I got back the original strings and the program executed fine.

I tried dynamically allocating memory to the strings and copying them the "standard" way, but the only way that worked was using malloc() and free(), which I hear is discouraged in C++. While I'm curious about that, the real question I have is this: Why did the program work when compiled in VC++, but not with gcc?

(This is one of many conflicts that I experienced while trying to make the code cross-platform.)

Thanks in advance!

-Carlos Nunez

+6  A: 

This is an example of undefined behavior. You're passing the result of string::c_str(), a const char*, to strtok, which takes a char*. By modifying the contents of the std::string data, you're invoking undefined behavior (you should be getting warnings for this unless you're casting).

When are you checking the value of a and b? In get_string, or in main? get_string is passed copies of a and b, so strtok will most likely not alter the originals in main. However, it could, as you are invoking undefined behavior.

The "right way" to do this is to use malloc/free or new[]/delete[]. You're using a C function, so you're already guilty of the same crime as you would be using malloc/free. A relatively elegant yet safe way to approach this is:

char *ap = strdup(a.c_str());
const char *a_ch = strtok(ap, ",");
/* do whatever it is you do */
free(ap);

Also bear in mind that strtok uses global state, so it won't play well with threads.

Joey Adams
Why `new[]/delete[]` over a much simpler `std::vector`?
GMan
In this case, std::vector wouldn't be much simpler than allocating with strdup() and freeing with free(). I guess std::vector might be a little simpler than new[]/delete[], though.
Joey Adams
+3  A: 

Tokens will be automatically replaced by a null-character by function strtok. That is not what you can do with constant data.

To make your code safe and cross-platform consider using boost::tokenizer.

Kirill V. Lyadvinsky
I amazed this did not get upvoted before. The current "big" answer does talk at large about the problem and why, but its suggested solution is kind of crappy...
Matthieu M.
Didn't know about that, though I shouldn't have been surprised; boost has everything. Thanks a lot!
Carlos Nunez
A: 

I think the code is working because of differences in string implementation. VC++ string implementation must be making copies when you pass them to a function that could potentially modify the string.

vipul lal
Another hypothesis is that the code optimizer eliminated the `strtok()` call, when it deduced that there was no legal way to observe the result of the call (dead code optimization).
MSalters
I would doubt this very much. How is `string.c_str()` to know how it is going to be used?
UncleBens