I have a string that I would like to tokenize. But the strtok() function requires my string to be a char*.
How can I do this quickly?
token = strtok(str.c_str(), " "); fails because it turns it into a const char*, not a char*
I have a string that I would like to tokenize. But the strtok() function requires my string to be a char*.
How can I do this quickly?
token = strtok(str.c_str(), " "); fails because it turns it into a const char*, not a char*
To start with, you may want to mention the language involved.
But given your syntax, whatever language you are talking about is very likely to already have a tokenize built for the standard String class. Use that.
Edit: I mentioned Split, but of course you'll have to be using managed C++ in Visual Studio for that to work. You can look around for standard library tokenizes as well if you need a more cross-platform solution.
Some posters do not seem to feel you need to provide any data on the language used for any question, but I say this - how is a user with a similar issue supposed to find the answer later if they are searching by language? You should all remember that Stack Overflow is not here to answer one users question, but many users hereafter...
At least use the C++ tag (which I just realized I had the power to add thanks to the community wiki model here, and have done so).
Duplicate the string, tokenize it, then free it.
char *dup = strdup(str.c_str());
token = strtok(dup, " ");
free(dup);
I suppose the language is C, or C++...
strtok, IIRC, replace separators with \0. That's what it cannot use a const string. To workaround that "quickly", if the string isn't huge, you can just strdup() it. Which is wise if you need to keep the string unaltered (what the const suggest...).
On the other hand, you might want to use another tokenizer, perhaps hand rolled, less violent on the given argument.
If boost is available on your system (I think it's standard on most Linux distros these days), it has a Tokenizer class you can use.
If not, then a quick Google turns up a hand-rolled tokenizer for std::string that you can probably just copy and paste. It's very short.
And, if you don't like either of those, then here's a split() function I wrote to make my life easier. It'll break a string into pieces using any of the chars in "delim" as separators. Pieces are appended to the "parts" vector:
void split(const string& str, const string& delim, vector<string>& parts) {
size_t start, end = 0;
while (end < str.size()) {
start = end;
while (start < str.size() && (delim.find(str[start]) != string::npos)) {
start++; // skip initial whitespace
}
end = start;
while (end < str.size() && (delim.find(str[end]) == string::npos)) {
end++; // skip to end of word
}
if (end-start != 0) { // just ignore zero-length strings.
parts.push_back(string(str, start, end-start));
}
}
}
Assuming that by "string" you're talking about std::string in C++, you might have a look at the Tokenizer package in Boost.
#include <iostream>
#include <string>
#include <sstream>
std::string myText("some-text-to-tokenize");
std::istringstream iss(myText);
std::string token;
while(getline(iss, token, '-'))
{
std::cout << token << std::endl;
}
Or, as mentioned, use boost for more flexibility.
EDIT: usage of const cast is only used to demonstrate the effect of strtok()
when applied to a pointer returned by string::c_str().
You should not use
strtok()
since it modifies the tokenized string which may lead to undesired, if not undefined, behaviour as the C string "belongs" to the string instance.
#include <string>
#include <iostream>
int main(int ac, char **av)
{
std::string theString("hello world");
std::cout << theString << " - " << theString.size() << std::endl;
//--- this cast *only* to illustrate the effect of strtok() on std::string
char *token = strtok(const_cast<char *>(theString.c_str()), " ");
std::cout << theString << " - " << theString.size() << std::endl;
return 0;
}
After the call to strtok()
, the space was "removed" from the string, or turned down to a non-printable character, but the length remains unchanged.
>./a.out
hello world - 11
helloworld - 11
Therefore you have to resort to native mechanism, duplication of the string or an third party library as previously mentioned.
First off I would say use boost tokenizer.
Alternatively if your data is space separated then the string stream library is very useful.
But both the above have already been covered.
So as a third C-Like alternative I propose copying the std::string into a buffer for modification.
std::string data("The data I want to tokenize");
// Create a buffer of the correct length:
std::vector<char> buffer(data.size()+1);
// copy the string into the buffer
strcpy(&buffer[0],data.c_str());
// Tokenize
strtok(&buffer[0]," ");