views:

184

answers:

5

Hi,

I just wrote a program that tokenizes a char array using pointers. The program only needed to work with a space as a delimiter character. I just turned it in and got full credit, but after turning it in I realized that this program only worked if the delimiter character was a space.

My question is, how could I make this program work with every delimiter character?

The function I've shown you below returns a pointer to the next word in the char array. This is what I believe I need to change if it is to work with all delimiter characters.

Thanks!

Code:

char* StringTokenizer::Next(void)
{
pNextWord = pStart;

if (*pStart == '\0') { return NULL; }

while (*pStart != delim)
{
    pStart++;
}

if (*pStart == '\0') { return NULL; }

*pStart = '\0';
pStart++;

return pNextWord;
}

The printing loop in main:

// this loop will display the tokens
while ( ( nextWord = tk.Next ( ) ) != NULL )
{
    cout << nextWord << endl;
}
+1  A: 

The simpliest way is to change your

while (*pStart != delim)

to something like

while (*pStart != ' ' && *pStart != '\n' && *pStart != '\t')

Or, you could make delim a string, and create a function that checks if a char is in the string:

bool isDelim(char c, const char *delim) {
   while (*delim) {
      if (*delim == c)
         return true;
      delim++;
   }
   return false;
}

while ( !isDelim(*pStart, " \n\t") ) 

Or, perhaps the best solution is to use one of the prebuilt functions for doing all this, such as strtok.

bramp
Using a prebuilt function is good advice -- but strtok really isn't. It has a lousy interface so code that uses it correctly is messy, it modifies the string it's passed, and it requires something like a thread-local variable (or two) to work correctly in a multithreaded environment. Nearly the *only* standard library function that's worse is `gets()`.
Jerry Coffin
@bramp: it sounds like the homework assignment is to actually implement a version of strtok (for better or worse). Also, remember to add the `*pStart != '\0'` check to the main while loop.
Jason Govig
@Jason Yep, that was basically the assignment. Like I said, turned it in and got full credit, but I'm trying to make it work with all delimiter characters, not just spaces.
Alex
A: 

Hmm...this doesn't look quite right:

if (*pStart = '\0')

The condition can never be true. I'm guessing you intended == instead of =? You also have a bit of a problem here:

while (*pStart != delim)

If the last word in the string isn't followed by a delimiter, this is going to run off the end of the string, which will cause serious problems.

Edit: Unless you really need to do this on your own, consider using a stringstream for the job. It already has all the right mechanism in place and quite heavily tested. It does add overhead, but it's quite acceptable in a lot of cases.

Jerry Coffin
A: 

Not compiled. but I'd do something like this.

 //const int N = someGoodValue;
char delimList[N] = {' ',',','.',';', '|', '!', '$', '\n'};//all delims here.

char* StringTokenizer::Next(void)
{
    if (*pStart == '\0') { return NULL; }

    pNextWord = pStart;

    while (1){  
        for (int x = 0; x < N; x++){
            if (*pStart == delimList[x]){ //this is it.
                *pStart = '\0';
                pStart++;
                return pNextWord;
            }

        }
        if ('\0' == *pStart){ //last word.. maybe.
                return pNextWord;   
        }
        pStart++;
    }
}

// (!compiled).
essbeev
A: 

Just change

while (*pStart != delim)

to this line

while (*pStart != '\0' && strchr(" \t\n", *pStart) == NULL)

Standard strchr function (declared in string.h header) looks for a character (given as a second argument) in a C-string (given as a first argument) and returns pointer to the string from position where that character firstly occurs. So strchr(" \t\n", *pStart) == NULL means that current character (*pStart) is not found in string " \t\n" and that is not a delimiter! (Change this delimiter string " \t\n" to adapt it to your needs, of course.)

This solution is short and simple way to test whether given character in a set (usually small) of given interesting characters. And it uses standard function.

By the way, you can do this using not only C-string, but with std::string too. All you need is to declare const std::string with " \t\n"-like value and then replace strchr with find method of the declared delimiter string.

ib
A: 

I assume that we want to stick to C instead of C++. Functions strspn and strcspn are good for tokenizing by a set a delimiters. You can use strspn to find where the next separator begins (i.e. where the current token ends) and then using strcspn to find where the separator ends (i.e. where the next token begins). Loop until you reach the end.

Tronic
What I described is pretty much equivalent to what strtok does. Using those two functions is not really harder than using strtok and they avoid the nasty problems that there are with strtok. However, what you are required to do in the exercise might slightly differ from that because delimiters are not getting combined.
Tronic