views:

389

answers:

5

How do i parse tokens from an input string. For example:

char *aString = "Hello world".

I want the output to be:

"Hello" "world"

+6  A: 

Take a look at strtok, part of the standard library.

tgamblin
+7  A: 

You are going to want to use strtok - here is a good example.

Andrew Hare
Note that strtok() is not thread safe.
lillq
You can use strtok_r() which is thread safe. You can see them both in the same manpage.
Nathan Fellman
+3  A: 

For re-entrant versions you can either use strtok_s for visual studio or strtok_r for unix

Leon Sodhi
+1  A: 

Keep in mind that strtok is very hard to get it right, because:

  • It modifies the input
  • The delimiter is replaced by a null terminator
  • Merges adjacent delimiters, and of course,
  • Is not thread safe.

You can read about this alternative.

dirkgently
+2  A: 

strtok is the easy answer, but what you really need is a lexer that does it properly. Consider the following:

  • are there one or two spaces between "hello" and "world"?
  • could that in fact be any amount of whitespace?
  • could that include vertical whitespace (\n, \f, \v) or just horizontal (\s, \t, \r)?
  • could that include any UNICODE whitespace characters?
  • if there were punctuation between the words, ("hello, world"), would the punctuation be a separate token, part of "hello,", or ignored?

As you can see, writing a proper lexer is not straightforward, and strtok is not a proper lexer.

Other solutions could be a single character state machine that does precisely what you need, or regex-based solution that makes locating words versus gaps more generalized. There are many ways.

And of course, all of this depends on what your actual requirements are, and I don't know them, so start with strtok. But it's good to be aware of the various limitations.

Paul Beckingham