tags:

views:

30

answers:

1

I Want to know what is the algorithmic complexity of following 1. Java String tokenizer 2. C++ STL based tokenizer 3. strtok.

Is there any faster algorithm then rudimentary strtok to tokenize a string based on custom delimeter.

A: 

Regarding Java, there are 3 main techniques for tokenizing (String.split(), StringTokenizer and StreamTokenizer). if you refer to the java.util.StringTokenizer class (which tokenizes by breaking the input string S on every occurrence of a character from a given string D), then the complexity is O(|S|*|D|). I.e., if you have only one delimiter char, this will be linear.

Note that the other tokenizers are more powerful in their abilities. String.split() for example can split around any pattern matching a given regex.

Eyal Schneider
Is there any other algorithm which is better than O(|S|*|D|)
Avinash
@Avinash: Theoretically yes. A simple implementation can store the delimiters in a hashtable and this results in O(|S|) time in average. I suppose that this is not really useful since the set of delimiters is usually very small.
Eyal Schneider