What is a good open source C word tokenizer library?
I am look for something like
Tokenize("there are three apples. One is orange, the other is blue,"
         " and, finally, the last is yellow!")
with the output not containing any punctuation.
What is a good open source C word tokenizer library?
I am look for something like
Tokenize("there are three apples. One is orange, the other is blue,"
         " and, finally, the last is yellow!")
with the output not containing any punctuation.
lex/flex is the classic tool, but it may be somewhat heavyweight for what you're doing.
If the only need is to strip the punctuations, I'd use a for cycle that outputs (whatever it means in your context) the source string character by character, skipping the ispunct() ones.