views:

158

answers:

7

My application produces strings like the one below. I need to parse values between the separator into individual values.

2342|2sd45|dswer|2342||5523|||3654|Pswt

I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.

token = (char *)strtok(strAccInfo, "|");

for (iLoop=1;iLoop<=106;iLoop++) {

      token = (char *)strtok(NULL, "|");   

}

Any suggestions?

Thanks

Bash

+2  A: 

On a first call, the function expects a C string as argument for str, whose first character is used as the starting location to scan for tokens. In subsequent calls, the function expects a null pointer and uses the position right after the end of last token as the new starting location for scanning.

To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token). And then scans starting from this beginning of the token for the first character contained in delimiters, which becomes the end of the token.

What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.

Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.

Romain Hippeau
Thanks for the information. Hopefully, I will remember this the next time I need to. :-DYour first solution screws up my results a bit, because there are valid components within the string that return a space between pipes. The second solution might become tedious and probably not implementable since the string may be different for different sets of data.
Bash
@Bash - Sorry I could not be of more help :(
Romain Hippeau
oh, you were a lot of help...information is power in our field, right?
Bash
A: 

Look into using strsep instead: strsep reference

renata
strsep is not portable - will not work on windows.
Romain Hippeau
+1  A: 

That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.

Gilles
I got some useful information from the link you posted. Thanks!
Bash
A: 

Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:

// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) { 
    static char *current;    // just as ugly as strtok!
    char *pos, *ret;
    if (input != NULL)
        current = input;

    if (current == NULL)
        return current;

    ret = current;
    pos = strpbrk(current, delim);
    if (pos == NULL) 
        current = NULL;
    else {
        *pos = '\0';
        current = pos+1;
    }
    return ret;
}
Jerry Coffin
Since the OP is only searching for one delimiter character, `strchr()` could be used instead of `strpbrk()`.
caf
I did it a little different. Thanks anyway.
Bash
+3  A: 

In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so can be used with const char *) and is really portable (even on embedded).

EDIT: and I forgot, it's reentrant, strtok isn't (btw reentrant has nothing to do with multi threading, strtok breaks already with nested loops. One can use strtok_r but it's not as protable).

tristopia
I used your input and updated my code. Thanks! I have the code that I am using below as an answer, if you're interested.
Bash
A: 

Below is the solution that is working for me now. Thanks to all of you who responded.

I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.

char strAccInfo[1024],* p2 ; int iLoop;

Action() { //This value would come from the wrsp call in the actual script. lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");

//Store the parameter into a string - saves memory. strcpy(strAccInfo,lr_eval_string("{test_Param}")); //Get the first instance of the separator "|" in the string p2 = (char *) strchr(strAccInfo,'|');

//Start a loop - Set the max loop value to more than max expected. for (iLoop = 1;iLoop<200;iLoop++) {

  //Save parameter names in sequence.
  lr_param_sprintf("Param_Name","Parameter_%d",iLoop);

    //Get the first instance of the separator "|" in the string

(within the loop). p2 = (char *) strchr(strAccInfo,'|');

  //Save the value for the parameters

in sequence. lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));

    //Save string after the first instance of p2, as strAccInfo - for

looping. strcpy(strAccInfo,p2+1);

    //Start conditional loop for checking for last value in the string.
  if (strchr(strAccInfo,'|')==NULL) {

      lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);

      lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));

      iLoop = 200;                    }
}
Bash
A: 
char *mystrtok(char **m,char *s,char c)
{
  char *p=s?s:*m;
  if( !*p )
    return 0;
  *m=strchr(p,c);
  if( *m )
    *(*m)++=0;
  else
    *m=p+strlen(p);
  return p;
}
  • reentrant
  • threadsafe
  • strictly ANSI conform
  • needs an unused help-pointer from calling context

e.g.

char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
  puts(t);

e.g.

char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
  char *p1,*t1;
  for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
    puts(t1);
}

your work :) implement char *c as parameter 3