tags:

views:

108

answers:

7

Please explain me the working of strtok() function.The manual says it breaks the string into tokens. I am unable to understand from the manual what actually it does.

I added watches on str and *pch to check its working, when the first while loop occurred, the contents of str were only "this". How did the output shown below printed on the screen?

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

Output:

Splitting string "- This, a sample string." into tokens:
This
a
sample
string
+1  A: 

The first time you call it, you provide the string to tokenize to strtok. And then, to get the following tokens, you just give NULL to that function, as long as it returns a non NULL pointer.

The strtok function records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)

tibur
A: 

strtok works by replacing the first delimiter character found (in this case, one of " ,.-") by a NUL byte, i.e. '\0'. It internally remembers a pointer to the character after that, so it can continue later (when you call it with a NULL argument).

Please look at this implementation for details. This version works by calling the much safer strtok_r function, which you should really use instead of strtok for reasons of safety.

larsmans
+1  A: 

strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.

xpmatteo
A: 

strtok replaces the characters in the second argument with a NULL and a NULL character is also the end of a string.

http://www.cplusplus.com/reference/clibrary/cstring/strtok/

Patrick
+1  A: 

strtok() divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."

Sachin Shanbhag
the ending condition for one token becomes the starting token of the next token?also is there a nul character placed in the place of the ending condition?
fahad
@fahad- Yes, all the delimeters you have will be replaced by NUL character as other people have also suggested.
Sachin Shanbhag
If all the delimiters are replaced by Nul,than why does the string contain"-this"? It should contain "\0"
fahad
@fahad - It only replaces the delimiter characters with NUL, not all the characters between delimiters. Its kind of splitting the string into multiple tokens. You get "This" because its between two specified delimiters and not the "-this".
Sachin Shanbhag
so replacing the second delimiter,a nul is placed?
fahad
@Fahad - Yes, absolutely. All spaces, "," and "-" are replaced by NUL because you have specified these as delimiters, as far as I understand.
Sachin Shanbhag
I observed str[0] and str[1].str[1] should be '\0' as you said because str[0] is '-',but it was a space there.
fahad
A: 

the strtok runtime function works like this

the first time you call strtok you provide a string that you want to tokenize

char s[] = "this is a string";

in the above string space seems to be a good delimiter between words so lets use that:

char* p = strtok(s, " ");

what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)

in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static buffer of your previous passed string:

p = strtok(NULL," ");

p now points to 'is'

and so on until no more spaces can be found, then the last string is returned as the last token 'string'.

more conveniently you could write it like this instead to print out all tokens:

for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
  puts(p);
}
Anders K.
So it does not actually place a nul character between the string?Why does my watch show that the string is left only with "THIS"?
fahad
it does indeed replace the ' ' it found with '\0'. And, it does not restore ' ' later, so your string is ruined for good.
Arkadiy
A: 

strtok will tokenize a string i.e. convert it into a series of substrings.

It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.

The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.

One way to invoke strtok, succintly, is as follows:

char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;

for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
    printf("token=%s\n", token);
}

Result:

this
is
the
string
I
want
to
parse
Ziffusion