tags:

views:

689

answers:

11

I want to make sure my string ends with ".foo". I am using C, a language I am not totally familiar with. The best way I have found to do it is below. Any C gurus want to make sure I'm doing this elegantly and wisely?

int EndsWithFoo(char *str)
{
    if(strlen(str) >= strlen(".foo"))
    {
        if(!strcmp(str + strlen(str) - strlen(".foo"), ".foo"))
        {
            return 1;
        }
    }
    return 0;
}
+2  A: 

The strlen(".foo")s are not required. If you really wanted to have it flexible you could use sizeof ".foo" - 1 -- a compile time constant.

Also, a null string check would be good.

dirkgently
Correct me if I'm wrong, but isn't sizeof(".foo") 5, but strlen(".foo") 4? I thought strlen was easier to read, since I'm dealing with string lengths here. And the compiler should optimize it away to a constant...How's the rest of the function look?
Specifically, it's not necessary because we already know how long ".foo" is.
Chuck
Isn't ".foo" a const char *? Even if counted as an array, it has five chars, since in array form it does have the terminating '\0'.
David Thornley
@JoeF: Right. My bad. But your function looks good to me otherwise except for a null check.
dirkgently
strlen used on string literals is usually optimised away anyway.
dreamlax
@dirkgently: What's this about NULL? I was saying that performing strlen(".foo") is optimised into a constant size_t equal to 4 using any decent compiler, since the length of the string is known at compile time. This can be verified with objdump.
dreamlax
@dreamlax: I mixed up another posting with this comment of yours. Taken the comment out. (I was just referring to input validation for the non-literal string argument.)
dirkgently
+4  A: 

I don't have access to a compiler right now, so could someone tell me if this works?

#include <stdio.h>
#include <string.h>

int EndsWithFoo(const char* s);

int
main(void)
{
  printf("%d\n", EndsWithFoo("whatever.foo"));

  return 0;
}

int EndsWithFoo(const char* s)
{
  int ret = 0;

  if (s != NULL)
  {
    size_t size = strlen(s);

    if (size >= 4 &&
        s[size-4] == '.' &&
        s[size-3] == 'f' &&
        s[size-2] == 'o' &&
        s[size-1] == 'o')
    {
      ret = 1;
    }
  }

  return ret;
}

Anyway, be sure to qualify the parameter as const, it tells everyone (including the compiler) that you don't intend to modify the string.

Bastien Léonard
+1 most optimized. I prefer such a version when 'foo' doesn't change!
dirkgently
Just a tip: If you have an internet connection, you have a C compiler available at codepad.org
John Cromartie
Murphy's Law says that ".foo" will change and at the most inopportune moment.
plinth
It's not "most optimized" as you read over the .foo part twice - once for strlen, once for comparison. Good and practical, though. +1
aib
+8  A: 

Don't call strlen more than once per string.

int EndsWith(const char *str, const char *suffix)
{
    if (!str || !suffix)
        return 0;
    size_t lenstr = strlen(str);
    size_t lensuffix = strlen(suffix);
    if (lensuffix >  lenstr)
        return 0;
    return strncmp(str + lenstr - lensuffix, suffix, lensuffix) == 0;
}

int EndsWithFoo(const char *str) { return EndsWith(str, ".foo"); }

EDIT: added NULL check for the pedantic. For the ultra pedantic, debate whether it should return non-zero if both str and suffix are both NULL.

plinth
In this case, you can get away with using strcmp() instead of strncmp() (or even memcmp()), since we know exactly how many characters are left in both strings at that point, although the speed difference will hardly be noticeable.
Adam Rosenfield
any call to strlen vanishes from the assembly anyway as soon as you turn on optimizations, so that's probably a case of premature optimization (although C strings are yucky enough to make one think about such issues)
Joey
@Johannes: How can this be, for strings other than compile-time-known string literals? Certainly you can inline the strlen code, but at some level you still need to find the length of the string. For const string literals, the compiler knows how long it is, but this is not true in general. Thoughts?
Matt J
@plinth: Your code will invoke UB when `str == NULL;`
dirkgently
wilhelmtell
A: 

I'd suggest the best way of doing it is to reverse the string then compare the first n characters.

There's any number of examples of string reversing functions out there (even Joel cites it as a standard interview question) so just implement one of those then step though the reversed strings to compare.

EDIT in response to downvotes. OK yes, this approach does require additional CPU or memory to implement, but the questioner does not indicate any such constraints and he explicitly asked for an elegant solution. Reversing the strings then comparing from the front is far more elegant than messing around finding the end of the strings and working backwards. And it's a lot easier for the next programmer along to grasp and maintain too.

Cruachan
Yes, that would work, but it's either a memory allocation for a copy or a double reverse to undo the damage.
plinth
So what? There's no indication that he's operating under under CPU or memory constrained circumstances and reversing the string to compare from the front is far more maintainable than messing around with finding the end-n position and hacking the logic from there.
Cruachan
You'd have to find the end of the string anyway to know how long it was so you'd know how to reverse it! It doesn't matter how you go about it, the problem involves the end of the string, so you'll have to find it one way or another.
dreamlax
of course, but if you reverse first the end of the string issue is effectively abstracted away into the reverse function, which is why it's a more elegant, maintainable and understandable solution
Cruachan
Except for the fact that it's horribly inefficient since you have to reverse the string twice, once to look at it and again to restore it, meaning read/write byte access for every byte in the string TWICE, just to look at the last four characters. Hardly understandable, hardly elegant at all.
dreamlax
No, because you put the reverse functionality into a function call so the function under discussion is simpler as that part is hidden. As for inefficiency - well the difference is in microseconds, programmer time is more valuable
Cruachan
Given that there is no standard C89 or C99 function to reverse a string, you'd have to implement and test this function as well as the function just to check for .foo on the end. It would be simpler and easier and take less time to implement any of the supplied solutions for this problem than yours
dreamlax
+2  A: 

You should try to avoid using strlen() multiple times for values you already know, since it's a runtime function. A #define would work here just fine. Also, doing strlen(str) multiple times is a waste: store the length in a variable so that you don't waste cycles with multiple passes through the string.

The revamped code:

#define FOO_FILE_EXTENSION_LENGTH 5 /* length includes the null terminator */

int EndsWithFoo(char *str)
{
    size_t stringLength = strlen(str);
    if(stringLength >= FOO_FILE_EXTENSION_LENGTH)
    {
        if(!strcmp(str + stringLength - FOO_FILE_EXTENSION_LENGTH, ".foo"))
        {
            return 1;
        }
    }
    return 0;
}
Reuben
Don't use a #define; use sizeof(".foo") -- or use the #define for the extension string: #define FOO_EXTN ".foo" and sizeof(FOO_EXTN).
Jonathan Leffler
@Jonathan: strlen on string literals are automatically optimised away by any decent compiler.
dreamlax
I don't trust my compilers that much - not quite. However, you're probably right; my experiences of 20 years ago shouldn't affect youngsters learning the language now. I stand by "do not use a #define for the length" as shown in this answer, though. That is simply an open invitation to trouble.
Jonathan Leffler
@Jonathan: You can verify it by using objdump. Functions like strlen being a standard C function allows C compilers to recognise its use and therefore can optimise more aggressively knowing the result of the function at compile-time. Other standard C functions are likewise optimised, I would hope.
dreamlax
Never use a #define when a sizeof will work - make the compiler do the math for you.
Nighthawk
A: 

You can also generalize like this:

int endsWith(const char* text, const char* extn)
{
    int result = 1;
    int len = strlen(text);
    int exprLen = strlen(extn);
    int index = len-exprLen;
    int count = 0;

    if(len > exprLen)
    {
     for( ; count  < exprLen; ++count)
     {
      if(text[index + count] != extn[count])
      {
       result = 0;
       break;
      }

     }
    }
    else
    {
     result = 0;
    }
    return result;
}
Naveen
I forgot there is a strncmp...
Naveen
+1  A: 

Tested code, includes the test:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int ends_with_foo(const char *str)
{
    char *dot = strrchr(str, '.');

    if (NULL == dot) return 0;
    return strcmp(dot, ".foo") == 0;
}

int main (int argc, const char * argv[]) 
{
    char *test[] = { "something", "anotherthing.foo" };
    int i;

    for (i = 0; i < sizeof(test) / sizeof(char *); i++) {
     printf("'%s' ends %sin '.foo'\n",
         test[i],
         ends_with_foo(test[i]) ? "" : "not ");
    }
    return 0;
}
stesch
Snap! Although I don't bother comparing the strcmp() result - just return it directly.
Blank Xavier
you are assuming that there is not other '.' in the input string.
Naveen
It doesn't make a difference. The strcmp() will only return 0 if there are two identical strings (e.g. the length must be the same). The compare will finish early if the strings differ in length.
Blank Xavier
@Blank Xavier: The version in the question returns 1 if it ends in ".foo". Just returning the result of strcmp() would return 0 on success.
stesch
@Naveen: strrchr() returns the last occurrence of the searched character.
stesch
Yeah, I wasn't too worried about getting the exact same return value - as long as the return value is defined, so you know what it will do, that's fine.
Blank Xavier
A: 
int EndsWithFoo( char *string )
{
  string = strrchr(string, '.');

  if( string != NULL )
    return( strcmp(string, ".foo") );

  return( -1 );
}

Will return 0 if ending with ".foo".

Blank Xavier
Nice. Short, and readable.
EvilTeach
*humble bow* you're very kind :-)
Blank Xavier
Return values should be reversed, since zero is false and non-zero is true, and the name of the function indicates Boolean return.
dreamlax
@EvilTeach: Thank you.
stesch
@dreamlax -> yes, this is true
Blank Xavier
A: 

Maybe...

bool endswith (const char* str, const char* tail)
{
  const char* foo = strrstr (str, tail);
  if (foo)
  {
     const int strlength = strlen (str);
     const int taillength = strlen (tail);
     return foo == (str + strlength - taillength);
  }
  return false;
}

endswith (str, ".foo");

By the way, the solution in the original question looks fine other than the repeated strlen calls.

Dan Olson
A: 

If there's always something beyond the dot, we could indulge in some pointer arithmetic:

int EndsWithFoo (char *str)
{
   int iRetVal = 0;
   char * pchDot = strrchr (str, '.');

   if (pchDot)
   {
      if (strcmp (pchDot+1, "foo") == 0)
      {
         iRetVal = 1;
      }
   }
   return iRetVal;
}

Of course you would probably want to add a little strlen there to check there is something beyond the dot :-)

NB - I didn't run this to check it, but it looks OK to me.

Bob Moore
+1  A: 

If you can change the signature of your function, then try changing it to

int EndsWith(char const * str, char const * suffix, int lenstr, int lensuf);

This will result in a safer, more reusable and more efficient code:

  1. The added const qualifiers will make sure you don't mistakenly alter the input strings. This function is a predicate, so I assume it is never meant to have side-effects.
  2. The suffix to compare against is passed in as a parameter, so you can save this function for later reuse with other suffixes.
  3. This signature will give you the opportunity to pass the lengths of the strings in if you already know them. We call this dynamic programming.

We can define the function like so:

int EndsWith(char const * str, char const * suffix, int lenstr, int lensuf)
{
    if( ! str && ! suffix ) return 1;
    if( ! str || ! suffix ) return 0;
    if( lenstr < 0 ) lenstr = strlen(str);
    if( lensuf < 0 ) lensuf = strlen(suffix);
    return strcmp(str + lenstr - lensuf, suffix) == 0;
}

The obvious counter-argument for the extra parameters is that they imply more noise in the code, or a less expressive code.

wilhelmtell