views:

1243

answers:

4

See also C Tokenizer


Here is a quick substr() for C that I wrote (yes, the variable initializations needs to be moved to start of the function etc, but you get the idea)

I have seen many "smart" implementations of substr() that are simple one liner calls strncpy()!

They are all wrong (strncpy does not guarantee null termination and thus the call might NOT produce a correct substring!)

Here is something maybe better?

Bring out the bugs!

char* substr(const char* text, int nStartingPos, int nRun)
{
    char* emptyString = strdup(""); /* C'mon! This cannot fail */

    if(text == NULL) return emptyString;

    int textLen = strlen(text);

    --nStartingPos;

    if((nStartingPos < 0) || (nRun <= 0) || (textLen == 0) || (textLen < nStartingPos)) return emptyString;

    char* returnString = (char *)calloc((1 + nRun), sizeof(char));

    if(returnString == NULL) return emptyString;

    strncat(returnString, (nStartingPos + text), nRun);

    /* We do not need emptyString anymore from this point onwards */

    free(emptyString);
    emptyString = NULL;

    return returnString;
}


int main()
{
    const char *text = "-2--4--6-7-8-9-10-11-";

    char *p = substr(text, -1, 2);
    printf("[*]'%s' (\")\n",  ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 1, 2);
    printf("[*]'%s' (-2)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 3, 2);
    printf("[*]'%s' (--)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 16, 2);
    printf("[*]'%s' (10)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 16, 20);
    printf("[*]'%s' (10-11-)\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 100, 2);
    printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    p = substr(text, 1, 0);
    printf("[*]'%s' (\")\n", ((p == NULL) ? "<NULL>" : p));
    free(p);

    return 0;
}

Output :

[*]'' (")
[*]'-2' (-2)
[*]'--' (--)
[*]'10' (10)
[*]'10-11-' (10-11-)
[*]'' (")
[*]'' (")
+1  A: 

char* emptyString = strdup(""); /* C'mon! This cannot fail? */

You need to check for null. Remember that it still must allocate 1 byte for the null character.

rlbond
didn't get you!
PoorLuzer
+4  A: 

I would say return NULL if the input isn't valid rather than a malloc()ed empty string. That way you can test whether or not the function failed or not with if(p) rather than if(*p == 0).

Also, I think your function leaks memory because emptyString is only free()d in one conditional. You should make sure you free() it unconditionally, i.e. right before the return.

As to your comment on strncpy() not NUL-terminating the string (which is true), if you use calloc() to allocate the string rather than malloc(), this won't be a problem if you allocate one byte more than you copy, since calloc() automatically sets all values (including, in this case, the end) to 0.

I would give you more notes but I hate reading camelCase code. Not that there's anything wrong with it.

EDIT: With regards to your updates:

Be aware that the C standard defines sizeof(char) to be 1 regardless of your system. If you're using a computer that uses 9 bits in a byte (god forbid), sizeof(char) is still going to be 1. Not that there's anything wrong with saying sizeof(char) - it clearly shows your intention and provides symmetry with calls to calloc() or malloc() for other types. But sizeof(int) is actually useful (ints can be different sizes on 16- and 32- and these newfangled 64-bit computers). The more you know.

I'd also like to reiterate that consistency with most other C code is to return NULL on an error rather than "". I know many functions (like strcmp()) will probably do bad things if you pass them NULL - this is to be expected. But the C standard library (and many other C APIs) take the approach of "It's the caller's responsibility to check for NULL, not the function's responsibility to baby him/her if (s)he doesn't." If you want to do it the other way, that's cool, but it's going against one of the stronger trends in C interface design.

Also, I would use strncpy() (or memcpy()) rather than strncat(). Using strncat() (and strcat()) obscures your intent - it makes someone looking at your code think you want to add to the end of the string (which you do, because after calloc(), the end is the beginning), when what you want to do is set the string. strncat() makes it look like you're adding to a string, while strcpy() (or another copy routine) would make it look more like what your intent is. The following three lines all do the same thing in this context - pick whichever one you think looks nicest:

strncat(returnString, text + nStartingPos, nRun);

strncpy(returnString, text + nStartingPos, nRun);

memcpy(returnString, text + nStartingPos, nRun);

Plus, strncpy() and memcpy() will probably be a (wee little) bit faster/more efficient than strncat().

text + nStartingPos is the same as nStartingPos + text - I would put the char * first, as I think that's clearer, but whatever order you want to put them in is up to you. Also, the parenthesis around them are unnecessary (but nice), since + has higher precedence than ,.

EDIT 2: The three lines of code don't do the same thing, but in this context they will all produce the same result. Thanks for catching me on that.

Chris Lutz
+1, .. EXCELLENT write up by the way Chris!
PoorLuzer
Your assertion about the 3 lines of code being same is wrong.strncat is guranteed to suffix a 0 after copying int the source string.strncpy is not guranteed to suffix a 0 after copying, but it will stop the copying process as soon as it comes across the first \0.... and we all know what memcpy is :-)
PoorLuzer
They are not the same, but in this context they will all produce the same result. My point is that strncat does a lot of extra (and in this case, unnecessary) work than strncpy or memcpy.
Chris Lutz
A: 

strdup could fail (though it is very unlikely and not worth checking for, IMHO). It does have another problem however - it is not a Standard C function. It would be better to use malloc.

anon
For what it's worth, strdup() is easy enough to write that you might as well use it, and use autoconf or something similar to check if you need to roll your own version.
Chris Lutz
strdup is now removed .. people were being too pedantic ;-)
PoorLuzer
+5  A: 

Your function seems very complicated for what should be a simple operation. Some problems are (not all of these are bugs):

  • strdup(), and other memory allocation functions, can fail, you should allow for all possible issues.
  • only allocate resources (memory in this case) if and when you need it.
  • you should be able to distinguish between errors and valid stings. At the moment, you don't know whether malloc() failure of substr ("xxx",1,1) or a working substr ("xxx",1,0) produces an empty string.
  • you don't need to calloc() memory that you're going to overwrite anyway.
  • all invalid parameters should either cause an error or be coerced to a valid parameter (and your API should document which).
  • you don't need to set the local emptyString to NULL after freeing it - it will be lost on function return.
  • you don't need to usr strncat() - you should know the sizes and the memory you have available before doing any copying so you can use the (most likely) faster memcpy().
  • you're use of base-1 rather than base-0 for string offsets goes against the grain of C.

The following segment is what I'd do (I rather like the Python idiom of negative values to count from the end of the string but I've kept length rather than end position).

char *substr (const char *inpStr, int startPos, int strLen) {
    /* Cannot do anything with NULL. */

    if (inpStr == NULL) return NULL;

    /* All negative positions to go from end, and cannot
       start before start of string, force to start. */

    if (startPos < 0)
        startPos = strlen (inpStr) + startPos;
    if (startPos < 0)
        startPos = 0;

    /* Force negative lengths to zero and cannot
       start after end of string, force to end. */

    if (strLen < 0)
        strLen = 0;
    if (startPos >strlen (inpStr))
        startPos = strlen (inpStr);

    /* Adjust length if source string too short. */

    if (strLen > strlen (&inpStr[startPos]))
        strLen = strlen (&inpStr[startPos]);

    /* Get long enough string from heap, return NULL if no go. */

    if ((buff = malloc (strLen + 1)) == NULL)
        return NULL;

    /* Transfer string section and return it. */

    memcpy (buff, &(inpStr[startPos]), strLen);
    buff[strLen] = '\0';

    return buff;
}
paxdiablo
+1, EXCELLENT writeup! Thanks!
PoorLuzer