views:

300

answers:

4

What is the easiest way of parsing a comma separated list, where there can be zero elements between each token. The cstring could look like

1, 3, 4, 5, 6, 7, 8, ....

But could also look like

, , , , , , , , , ...

I've tried something like:

char *original = "1, 3, 4, 5, 6, 7, 8, ...."
char *tok = strtok(original," ,")
while(tok!=NULL){
    while(*tok!='\0'){
      //dostuff
      tok++;
    }
tok=strtok(NULL," ,");
}

This apparently only works, if there are elements between the comma's, for instance I've noticed that the first item list will be skipped if there are no elements.

I've tried other solutions like strchr(), but this gets very ugly, and I think there is an easier way.

Thanks

Update:

After some testing I noticed that tokenizing on "," seemed to work, on all cases except if the first item was missing. So I'm pulling that out as a special case.

char *original = "1, 3, 4, 5, 6, 7, 8, ...."
if(*original==',')
  //dostuff    
char *tok = strtok(original,",")
while(tok!=NULL){
    while(*tok!='\0'){
      //dostuff
      tok++;
    }
tok=strtok(NULL,",");
}

Thanks for your input and your help. (Maybe I should have given this a more careful thought before posting.)

A: 
strtok cannot cannot distinguish between `,` and `,,`.
fastcodejava
Hi thanks for your reply, there is also a whitespace after the comma, so if I just use "," as delimiter, it will still skip the first element.
monkeyking
+3  A: 

You might want to look into the nonstandard strsep, which is designed to be a replacement for strtok which allows parsing of empty fields. See also the glibc manual chapter on Finding Tokens in a String. It's available on many systems (various BSDs, Linux, Mac OS X), but is not standardized, so I believe it may not be present on Windows or Solaris.

Brian Campbell
Brian, I don't think it's a very good idea to suggest usage of non-standard functions to newbies
Eli Bendersky
I think it's reasonable to point out non-standard functions that do what they need, as long as you include a caveat that they may not be available on all systems. I usually leave it up to the person asking the question to determine if my answer is sufficient for them, as long as I provide enough information to tell that it might have portability issues. Do you think I should put the caveat at the beginning of my answer instead of the end?
Brian Campbell
@Brian, perhaps make it more marked? I.e. like saying "warning, this will make your code non-portable". Also, since it's not standartized you can't be sure it will be on all those platforms you mention. You assume a certain environment/compiler/library, but in fact the user can have something different. This isn't the case for standard features, and standards-complying compilers.
Eli Bendersky
this is better :-)
Eli Bendersky
The problem with standards is that there's no guarantee that just because something is standard it is available (C99 in MSVC?), and there is also some functionality which has never been standardized and yet is available on every major platform. I find it's better to worry about whether it's supported on the platforms that you need support, than always restricting yourself to only standard features.
Brian Campbell
MSVC doesn't claim C99 compatibility. Additionally, I think you're making the assumption that platform == OS, which is somewhat misleading, because there can be multiple platforms (compilers) on the same OS, and some compilers run on multiple OSes. In the end, if you want your program runnable "anywhere" as much as that's feasible, sometimes the only way is to use the minimal subset supportable on most platforms. The standard (not C99 though) is the best approximation to this minimum
Eli Bendersky
You're right, I'm aware that platform != OS, but it's a convenient shortcut I use to not have to specify every last detail of the toolchain when saying where something will or won't work.
Brian Campbell
A: 

How about a simple for loop?

for (int begin = 0; original[begin]; ) {
  int end = begin;
  while (original[end] && original[end] != ',')
    ++end;

  // do something with original[begin] through original[end-1]

  begin = end;
}
Grumdrig
+2  A: 

If all you need to do is ignore empty "tokens", you can use the strspn function to detect whitespace-only strings. Here's an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


/* Is the given string whitespace only?
*/
int iswhitespace(char* s)
{
    return (strspn(s, " \t") == strlen(s));
}


int main()
{
    char line[] = "1, , 3, 4, 5, 6";
    char sep[] = ",";
    char* tok;

    tok = strtok(line, sep);

    while (tok)
    {
        if (iswhitespace(tok))
            printf("empty token\n");
        else
            printf("new token: %s\n", tok);

        tok = strtok(0, sep);
    }

    return 0;
}

The key idea here is to tokenize on a comma only, and not " ," which skips the first element. Whitespace can then be handled separately.

Of course this still leaves the fact that strtok will skip spans of consecutive commas. If this isn't good for you, you can't use strtok and will have to employ another solution.

Eli Bendersky
+1 for submitting while I was typing up my version of this answer...
mrkj