views:

1041

answers:

12

All,

I am trying to find a similar function to 'strstr' that searches a substring starting from the end towards the beginning of the string.

Thanks

A: 

I don't believe there is in the c string lib, but it would be trivial to write your own, On one condition, you know the length of the string or it is properly terminated.

hhafez
Can you please read the footnote of my question?
Manav Sharma
A: 

There isn't one in the standard C library. You may be able to find one on the web, or you may have to write your own.

Kinopiko
+5  A: 

I don't know of one. One of the nice things about C is that if you write your own function, it's just as fast and efficient as the library ones. (This is totally not the case in many other languages.)

You could reverse the string and the substring, and then search.

Finally, the other thing people often do when the string library isn't good enough is to move to regular expressions.

Ok, I wrote both reverse() and rstrstr(), which might work if we are lucky. Get rid of __restrict for C++. You also might want to make the parameters const, but then you will need to cast the return value. To answer your comment question, you can get the index from the address of the substring by just substracting the original string pointer from it. OK:

#include <stdlib.h>
#include <string.h>

char *reverse(const char * __restrict const s)
{
  if (s == NULL)
    return NULL;
  size_t i, len = strlen(s);
  char *r = malloc(len + 1);

  for(i = 0; i < len; ++i)
    r[i] = s[len - i - 1];
  r[len] = 0;
  return r;
}

char *rstrstr(char *__restrict s1, char *__restrict s2)
{
  size_t  s1len = strlen(s1);
  size_t  s2len = strlen(s2);
  char *s;

  if (s2len > s1len)
    return NULL;
  for (s = s1 + s1len - s2len; s >= s1; --s)
    if (strncmp(s, s2, s2len) == 0)
      return s;
  return NULL;
}
DigitalRoss
Can you please read the footnote of my question?
Manav Sharma
"it's just as fast and efficient as the library ones" That is not true, you can very easily write code in C that is much less efficient than the library one (for example completely botch up your own "qsort" routine and make O(n^2) and turn it into insertion sort.
hhafez
Obviously my statement implies "unless you botch the implementation", but we can't make any statements about anything without including that assumption. I mean, *seriously* now.
DigitalRoss
Its easily possible for a library routine to be 10x faster than an algorithmically correct naive implementation. Try writing your own pow( ) function that delivers sub-ulp accuracy and see how you do on performance. Even in simpler functions, a naive `memcpy( )` might use byte-by-byte copies, where a library implementation might use an unrolled loop of vector copies.
Stephen Canon
You guys are missing the overall point, which is that your routine is written in C, the library routine is written in C, everyone is on a level playing field. Let's see someone implement strcmp in Perl, Python, Ruby, or even Java. That's the, ahem, *obvious*, point.
DigitalRoss
The library routine is not necessarily written in C. When I'm working on a standard library, I'm usually writing assembly. You could write a C library function in nearly any compiled language that knows about the C calling conventions on the target platform. (That said, I agree that all this is beside the point).
Stephen Canon
Your reverse() won't work with multi-byte characters.
Dan
@Dan, you're wrong. It reverses the sequence of `char` values ignoring any larger multibyte character structure, which `strstr` would ignore anyway. As long as you translate the result back into an offset into the original string, it works just fine with multibyte character strings.
R..
+1  A: 

If you can use C++, you can search strings like this:

std::string::iterator found=std::search(haystack.rbegin(), haystack.rend(), needle.rbegin(), needle.rend()).base();
// => yields haystack.begin() if not found, otherwise, an iterator past-the end of the occurence of needle
jpalecek
A: 

Long story short:

Nope - there is no function in the C-library that does what you need..

But as others have pointed out: It's not rocket-science to write such a function...

Nils Pipenbrinck
It's not rocket science to write a slow implementation, but making it fast requires some fancy algorithms.
R..
A: 

No. This is one of the places that the C++ std::string class has an obvious advantage -- along with std::string::find(), there's also std::string::rfind().

Jerry Coffin
+2  A: 

Here is one. Testing it is an exercise I'll leave to you :)

Stuart
+2  A: 

One possible, if not entirely elegant, implementation might look like:

#include "string.h"

const char* rstrstr(const char* haystack, const char* needle)
{
  int needle_length = strlen(needle);
  const char* haystack_end = haystack + strlen(haystack) - needle_length;
  const char* p;
  size_t i;

  for(p = haystack_end; p >= haystack; --p)
  {
    for(i = 0; i < needle_length; ++i) {
      if(p[i] != needle[i])
        goto next;
    }
    return p;

    next:;
  }
  return 0;
}
Theo Spears
As a bit of nitpicking I would make all of those `const char *`. http://codepad.org/KzryjtRE
Kinopiko
@Kinopiko: Actually, that's not even nitpicking. Leaving the const out would make this function a pain to use for the caller if their "needle" or "haystack" is already const.
Dan Moulding
`int` should be `size_t`. But I almost want to upvote for the `goto`.
DigitalRoss
Both good suggestions, now incorporated.
Theo Spears
You need to make `haystack_end` and `p` and the return value of the function `const char *` too. Please see my paste on codepad.org.
Kinopiko
+1  A: 

Is there a C Library function to find the index to the last occurrence of a substring within a string?

Edit: As @hhafez notes in a comment below, the first solution I posted for this was inefficient and incorrect (because I advanced the pointer by target_length which worked fine in my silly test). You can find that version in the edit history.

Here is an implementation that starts at the end and works back:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

const char *
findlast(const char *source, const char *target) {
    const char *current;
    const char *found = NULL;

    size_t target_length = strlen(target);
    current = source + strlen(source) - target_length;

    while ( current >= source ) {
        if ( (found = strstr(current, target)) ) {
            break;
        }
        current -= 1;
    }

    return found;
}

int main(int argc, char *argv[]) {
    if ( argc != 3 ) {
        fputs("invoke with source and search strings as arguments", stderr);
        return EXIT_FAILURE;
    }

    const char *found = findlast(argv[1], argv[2]);

    if ( found ) {
        printf("Last occurence of '%s' in '%s' is at offset %d\n",
                argv[2], argv[1], found - argv[1]
                );
    }
    return 0;
}

Output:

C:\Temp> st "this is a test string that tests this" test
Last occurence of 'test' in 'this is a test string that tests this' is 
at offset 27
Sinan Ünür
but that is ugly compared to writing your own routine, what if the string is really long? Also don't forget that he is expecting to have multiple occurrences and he expects to find them towards the end, that's why he wants to start from the end and not from the start.
hhafez
A: 

Thanks for your answers! There is one more way which came from the MSDN forum. http://social.msdn.microsoft.com/Forums/en-US/vclanguage/thread/ed0f6ef9-8911-4879-accb-b3c778a09d94

Manav Sharma
A: 

I think you can still do it using library funtions.

1.Use strrev funtion to reverse the string.

2.Use strstr funtion to do whatever you want to do.

3.You can find start index (from reverse ) of the search string by subtracting start index of the search string from the length of original string.

Ashish
+1  A: 

The standard C library does not have a "reverse strstr" function, so you have to find or write your own.

I came up with a couple of solutions of my own, and added some testing and benchmarking code together with the other functions in this thread. For those curious, running on my laptop (Ubuntu karmic, amd64 architecture) the output looks like this:

$ gcc -O2 --std=c99 strrstr.c && ./a.out
#1 0.123 us last_strstr
#2 0.440 us theo
#3 0.460 us cordelia
#4 1.690 us digitalross
#5 7.700 us backwards_memcmp
#6 8.600 us sinan

Your results may be different and, depending on your compiler and library, the ordering of the results may also be different.

To get the offset (index) of the match from the beginning of the string, use pointer arithmetic:

char *match = last_strstr(haystack, needle);
ptrdiff_t index;
if (match != NULL)
    index = match - haystack;
else
    index = -1;

And now, the larch (note that this is purely in C, I do not know C++ well enough to give an answer for it):

/*
 * In response to
 * http://stackoverflow.com/questions/1634359/is-there-a-reverse-fn-for-strstr
 *
 * Basically, strstr but return last occurence, not first.
 *
 * This file contains several implementations and a harness to test and
 * benchmark them.
 *
 * Some of the implementations of the actual function are copied from
 * elsewhere; they are commented with the location. The rest of the coe
 * was written by Lars Wirzenius ([email protected]) and is hereby released into
 * the public domain. No warranty. If it turns out to be broken, you get
 * to keep the pieces.
 */


#include <string.h>
#include <stdlib.h>


/* By liw. */
static char *last_strstr(const char *haystack, const char *needle)
{
    if (*needle == '\0')
        return (char *) haystack;

    char *result = NULL;
    for (;;) {
        char *p = strstr(haystack, needle);
        if (p == NULL)
            break;
        result = p;
        haystack = p + 1;
    }

    return result;
}


/* By liw. */
static char *backwards_memcmp(const char *haystack, const char *needle)
{
    size_t haylen = strlen(haystack);

    if (*needle == '\0')
        return (char *) haystack;

    size_t needlelen = strlen(needle);
    if (needlelen > haylen)
        return NULL;

    const char *p = haystack + haylen - needlelen;
    for (;;) {
        if (memcmp(p, needle, needlelen) == 0)
            return (char *) p;
        if (p == haystack)
            return NULL;
        --p;
    }
}


/* From http://stuff.mit.edu/afs/sipb/user/cordelia/Diplomacy/mapit/strrstr.c
 */
static char *cordelia(const char *s1, const char *s2)
{
 const char *sc1, *sc2, *psc1, *ps1;

 if (*s2 == '\0')
  return((char *)s1);

 ps1 = s1 + strlen(s1);

 while(ps1 != s1) {
  --ps1;
  for (psc1 = ps1, sc2 = s2; ; )
   if (*(psc1++) != *(sc2++))
    break;
   else if (*sc2 == '\0')
    return ((char *)ps1);
 }
 return ((char *)NULL);
}


/* From http://stackoverflow.com/questions/1634359/
   is-there-a-reverse-fn-for-strstr/1634398#1634398 (DigitalRoss). */
static char *reverse(const char *s)
{
  if (s == NULL)
    return NULL;
  size_t i, len = strlen(s);
  char *r = malloc(len + 1);

  for(i = 0; i < len; ++i)
    r[i] = s[len - i - 1];
  r[len] = 0;
  return r;
}
char *digitalross(const char *s1, const char *s2)
{
  size_t  s1len = strlen(s1);
  size_t  s2len = strlen(s2);
  const char *s;

  if (s2len == 0)
    return (char *) s1;
  if (s2len > s1len)
    return NULL;
  for (s = s1 + s1len - s2len; s >= s1; --s)
    if (strncmp(s, s2, s2len) == 0)
      return (char *) s;
  return NULL;
}


/* From http://stackoverflow.com/questions/1634359/
  is-there-a-reverse-fn-for-strstr/1634487#1634487 (Sinan Ünür). */

char *sinan(const char *source, const char *target)
{
    const char *current;
    const char *found = NULL;

    if (*target == '\0')
        return (char *) source;

    size_t target_length = strlen(target);
    current = source + strlen(source) - target_length;

    while ( current >= source ) {
        if ( (found = strstr(current, target)) ) {
            break;
        }
        current -= 1;
    }

    return (char *) found;
}


/* From http://stackoverflow.com/questions/1634359/
  is-there-a-reverse-fn-for-strstr/1634441#1634441 (Theo Spears). */
char *theo(const char* haystack, const char* needle)
{
  int needle_length = strlen(needle);
  const char* haystack_end = haystack + strlen(haystack) - needle_length;
  const char* p;
  size_t i;

  if (*needle == '\0')
    return (char *) haystack;
  for(p = haystack_end; p >= haystack; --p)
  {
    for(i = 0; i < needle_length; ++i) {
      if(p[i] != needle[i])
        goto next;
    }
    return (char *) p;

    next:;
  }
  return 0;
}


/*
 * The rest of this code is a test and timing harness for the various
 * implementations above.
 */


#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>


/* Check that the given function works. */
static bool works(const char *name, char *(*func)(const char *, const char *))
{
    struct {
        const char *haystack;
        const char *needle;
        int offset;
    } tests[] = {
        { "", "", 0 },
        { "", "x", -1 },
        { "x", "", 0 },
        { "x", "x", 0 },
        { "xy", "x", 0 },
        { "xy", "y", 1 },
        { "xyx", "x", 2 },
        { "xyx", "y", 1 },
        { "xyx", "z", -1 },
        { "xyx", "", 0 },
    };
    const int num_tests = sizeof(tests) / sizeof(tests[0]);
    bool ok = true;

    for (int i = 0; i < num_tests; ++i) {
        int offset;
        char *p = func(tests[i].haystack, tests[i].needle);
        if (p == NULL)
            offset = -1;
        else
            offset = p - tests[i].haystack;
        if (offset != tests[i].offset) {
            fprintf(stderr, "FAIL %s, test %d: returned %d, haystack = '%s', "
                            "needle = '%s', correct return %d\n",
                            name, i, offset, tests[i].haystack, tests[i].needle,
                            tests[i].offset);
            ok = false;
        }
    }
    return ok;
}


/* Dummy function for calibrating the measurement loop. */
static char *dummy(const char *haystack, const char *needle)
{
    return NULL;
}


/* Measure how long it will take to call the given function with the
   given arguments the given number of times. Return clock ticks. */
static clock_t repeat(char *(*func)(const char *, const char *),
                       const char *haystack, const char *needle,
                       long num_times)
{
    clock_t start, end;

    start = clock();
    for (long i = 0; i < num_times; ++i) {
        func(haystack, needle);
    }
    end = clock();
    return end - start;
}


static clock_t min(clock_t a, clock_t b)
{
    if (a < b)
        return a;
    else
        return b;
}


/* Measure the time to execute one call of a function, and return the
   number of CPU clock ticks (see clock(3)). */
static double timeit(char *(*func)(const char *, const char *))
{
    /* The arguments for the functions to be measured. We deliberately
       choose a case where the haystack is large and the needle is in
       the middle, rather than at either end. Obviously, any test data
       will favor some implementations over others. This is the weakest
       part of the benchmark. */

    const char haystack[] = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "b"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                            "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
    const char needle[] = "b";

    /* First we find out how many repeats we need to do to get a sufficiently
       long measurement time. These functions are so fast that measuring
       only a small number of repeats will give wrong results. However,
       we don't want to do a ridiculously long measurement, either, so 
       start with one repeat and multiply it by 10 until the total time is
       about 0.2 seconds. 

       Finally, we measure the dummy function the same number of times
       to get rid of the call overhead.

       */

    clock_t mintime = 0.2 * CLOCKS_PER_SEC;
    clock_t clocks;
    long repeats = 1;
    for (;;) {
        clocks = repeat(func, haystack, needle, repeats);
        if (clocks >= mintime)
            break;
        repeats *= 10;
    }

    clocks = min(clocks, repeat(func, haystack, needle, repeats));
    clocks = min(clocks, repeat(func, haystack, needle, repeats));

    clock_t dummy_clocks;

    dummy_clocks = repeat(dummy, haystack, needle, repeats);
    dummy_clocks = min(dummy_clocks, repeat(dummy, haystack, needle, repeats));
    dummy_clocks = min(dummy_clocks, repeat(dummy, haystack, needle, repeats));

    return (double) (clocks - dummy_clocks) / repeats / CLOCKS_PER_SEC;
}


/* Array of all functions. */
struct func {
    const char *name;
    char *(*func)(const char *, const char *);
    double secs;
} funcs[] = {
#define X(func) { #func, func, 0 }
    X(last_strstr),
    X(backwards_memcmp),
    X(cordelia),
    X(digitalross),
    X(sinan),
    X(theo),
#undef X
};
const int num_funcs = sizeof(funcs) / sizeof(funcs[0]);


/* Comparison function for qsort, comparing timings. */
int funcmp(const void *a, const void *b)
{
    const struct func *aa = a;
    const struct func *bb = b;

    if (aa->secs < bb->secs)
        return -1;
    else if (aa->secs > bb->secs)
        return 1;
    else
        return 0;
}


int main(void)
{

    bool ok = true;
    for (int i = 0; i < num_funcs; ++i) {
        if (!works(funcs[i].name, funcs[i].func)) {
            fprintf(stderr, "%s does not work\n", funcs[i].name);            
            ok = false;
        }
    }
    if (!ok)
        return EXIT_FAILURE;

    for (int i = 0; i < num_funcs; ++i)
        funcs[i].secs = timeit(funcs[i].func);
    qsort(funcs, num_funcs, sizeof(funcs[0]), funcmp);
    for (int i = 0; i < num_funcs; ++i)
        printf("#%d %.3f us %s\n", i+1, funcs[i].secs * 1e6, funcs[i].name);

    return 0;
}
Lars Wirzenius
Sorry about the length. All the interesting parts (the actual implementations of reverse strstr) are at the top of the code, so should be easy to find.
Lars Wirzenius