views:

2418

answers:

7

I am faced with the need to pull out the information in a string of the format "blah.bleh.bloh" in ANSI C. Normally I would use strok() to accomplish this, but since I am getting this string via strtok, and strtok is not thread-safe, I cannot use this option.

I have written a function to manually parse the string. Here is a snippit:

for(charIndex=0; charIndex < (char)strlen(theString); charIndex++)
{
    if(theString[charIndex] == '.')
    {
        theString[charIndex] = '\0';
        osi_string_copy_n(Info[currentInfoIndex], 1024, theString, charIndex + 1 );
        currentInfoIndex++;
        theString = &theString[charIndex + 1];
    }
    charIndex++;
}

As you can see, I try to find the first occurrence of '.' and make note of the index of the character. Then I convert the '.' to a null char and copy the first string to an array.

Then I want to change the pointer to start just after where the delimiter was found, essentially giving me a new shorter string.

Unfortunately I am getting an error on the line:

theString = &theString[charIndex + 1];

The error is:

error C2106: '=' : left operand must be l-value

Why am I not allowed to move the pointer like this? Is my method flawed? Perhaps someone has a better idea for me to parse this string.

EDIT: In response to the comments, the declaration for theString is:

char theString[1024] = {0};

Also, I am guaranteed that theString will never be more than 1024 characters.

A: 

The variable "theString" must be a pointer and not an array type.

Software Monkey
+7  A: 

Under the assumption that you defined theString as an array, try defining it as a pointer. When you declare a char variable as an array, you cannot later change its address.

I am assuming you have a declaration similar to

char theString[100];

The easiest solution is to leave that declaration alone, and add another one:

char *str = theString;

and then use str everywhere that you currently use theString.

Eddie
If I declare theString as a pointer, won't I then have to malloc space for it? Part of the reason I was declaring it with defined space is so that I wouldn't have to worry about deallocating the memory in a complicated series of if then else statements later.
Tim
That's why I recommended leaving the allocation alone and then defining a pointer to reference the array. The pointer you can change at will.
Eddie
Ahh, I understand, now. This would be a good solution to my problem. However, I actually ended up using an internally developed version of strtok that is thread-safe. Thanks for your input!
Tim
A: 

Is theString #defined for some reason, rather than being a variable? If it were replaced by a string literal by the processor, that would make your statement invalid.

I'm no expert and this probably isn't it, but I just thought I'd throw it out there.

Jeremy Banks
+4  A: 

You can use strtok_r which is available on most platforms and is reentrant. This means that it does not hold internal state, and you can call it from nested loops with no trouble.

Greg Hewgill
+2  A: 

There is only one true C way, the use of pointers, tight loops and arcane commands :-).

The getNext() function below will allow you to return all the components in order, followed by a NULL sentinel. You need to provide a big enough buffer to store the components. I've also included my test program so you can check it (and add more unit test cases if you wish).

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *getNext (char *pStr, char *pComp) {
    /* Special for '.' at string end. */
    if ((*pStr == '.') && (*(pStr+1) == '\0')) {
        *pComp = '\0';
        return pStr + 1;
    }

    /* Check if no components left. */
    if (*pStr == '\0')
        return NULL;

    /* Transfer component one character at a time. */
    while ((*pStr != '\0') && (*pStr != '.'))
        *pComp++ = *pStr++;
    *pComp = '\0';

    /* Skip '.' at end, if there, but not at end of string. */
    if ((*pStr == '.') && (*(pStr+1) != '\0'))
        pStr++;

    // Return location of next component.
    return pStr;
}

int main (int argCount, char *argVal[]) {
    int argNum;
    int compNum;
    char *newStr;
    char *strPtr;

    if (argCount < 2) {
        printf ("Usage: components <string to componentize>...\n");
        return 1;
    }
    for (argNum = 1; argNum < argCount; argNum++) {
        if ((newStr = malloc (strlen (argVal[1]) + 1)) == NULL) {
            printf ("Out of memory for '%s'.", argVal[argNum]);
        } else {
            printf ("Input string is '%s'.\n", argVal[argNum]);
            compNum = 0;
            strPtr = getNext (argVal[argNum],newStr);
            while (strPtr != NULL) {
                printf ("   Component [%3d] is '%s'.\n", ++compNum, newStr);
                strPtr = getNext (strPtr,newStr);
            }
            free (newStr);
        }
    }

    return 0;
}

Here's the output:

[fury]> components your.test.string .dot.at.start at.end. .both. no_dots ''
Input string is 'your.test.string'.
    Component [  1] is 'your'.
    Component [  2] is 'test'.
    Component [  3] is 'string'.
Input string is '.dot.at.start'.
    Component [  1] is ''.
    Component [  2] is 'dot'.
    Component [  3] is 'at'.
    Component [  4] is 'start'.
Input string is 'at.end.'.
    Component [  1] is 'at'.
    Component [  2] is 'end'.
    Component [  3] is ''.
Input string is '.both.'.
    Component [  1] is ''.
    Component [  2] is 'both'.
    Component [  3] is ''.
Input string is 'no_dots'.
    Component [  1] is 'no_dots'.
Input string is ''.
paxdiablo
A: 

The line theString = &theString[charIndex + 1]; should have never existed in the first place. Even if this line don't give error and run properly, theString[charIndex] will not be the next adjacent character which you expect, as theString is moved.

My recommendation, with nearly minimal code change:

for(charIndex=0; charIndex < strlen(theString); charIndex++)
{
    if(theString[charIndex] == '.')
    {
        theString[charIndex] = '\0';
        osi_string_copy_n(Info[currentInfoIndex], 1024, theString + subStrStart, charIndex + 1 - subStrStart);
        currentInfoIndex++;
        subStrStart = charIndex + 1;
    }
    charIndex++;
}

I am not sure what does your osi_string_copy_n do, so that line is just a guess from your original code. But if you are copying away the substrings to another place, with the substring length specified in the function parameter, then there should be no need to null the end of substring?

Edit:
I found your code has two charIndex++. Is theString double byte string? If it's so, maybe it is more proper to use wchar_t

billyswong
A: 

If you have a non-ancient libc, you have strtok_r, which is a re-entrant variant of strtok.

char *saveptr;
char *str;
for (str = strtok_r(theString, ".", &saveptr);
        str;
        str = strtok_r(NULL, ".", &saveptr)
    )
{
    printf("got: '%s'\n", str);
}

This is guaranteed not to clobber any state strtok keeps around, or any state other strtok_r calls keep (assuming they don't share your saveptr).

ephemient
Good answer. Isn't the second param of strtok_r() supposed to be a string?
paxdiablo
You're right, it's supposed to be a string of delimiters and not just a character. I'll fix that...
ephemient