Why do you use _strlwr(string); in init_stristr()? It's not a standard function. Presumably it's for locale support, but as it's not standard, I'd just use:
char_table[i] = tolower(i);
Why do you use _strlwr(string); in init_stristr()? It's not a standard function. Presumably it's for locale support, but as it's not standard, I'd just use:
char_table[i] = tolower(i);
* I deliberately chose not to comment it. You should have at least
* as much fun trying to understand it, as I had to write it :-).
Super ego power, I choose you!
This is the slowest implementation I've ever seen. Meaning that it takes longer to understand than any other implementation I've had the ahem 'joy' ahem of maintaining.
I'd advice you to take some of the common strcasestr implementation that already exists. For example of glib, glibc, OpenBSD, FreeBSD, etc. You can search for more with google.com/codesearch. You can then make some performance measurements and compare the different implementation.
The code you posted is about half as fast as strcasestr
.
$ gcc -Wall -o my_stristr my_stristr.c
steve@solaris:~/code/tmp
$ gcc -Wall -o strcasestr strcasestr.c
steve@solaris:~/code/tmp
$ ./bench ./my_stristr > my_stristr.result ; ./bench ./strcasestr > strcasestr.result;
steve@solaris:~/code/tmp
$ cat my_stristr.result
run 1... time = 6.32
run 2... time = 6.31
run 3... time = 6.31
run 4... time = 6.31
run 5... time = 6.32
run 6... time = 6.31
run 7... time = 6.31
run 8... time = 6.31
run 9... time = 6.31
run 10... time = 6.31
average user time over 10 runs = 6.3120
steve@solaris:~/code/tmp
$ cat strcasestr.result
run 1... time = 3.82
run 2... time = 3.82
run 3... time = 3.82
run 4... time = 3.82
run 5... time = 3.82
run 6... time = 3.82
run 7... time = 3.82
run 8... time = 3.82
run 9... time = 3.82
run 10... time = 3.82
average user time over 10 runs = 3.8200
steve@solaris:~/code/tmp
The main
function was:
int main(void)
{
char * needle="hello";
char haystack[1024];
int i;
for(i=0;i<sizeof(haystack)-strlen(needle)-1;++i)
{
haystack[i]='A'+i%57;
}
memcpy(haystack+i,needle, strlen(needle)+1);
/*printf("%s\n%d\n", haystack, haystack[strlen(haystack)]);*/
init_stristr();
for (i=0;i<1000000;++i)
{
/*my_stristr(haystack, needle);*/
strcasestr(haystack,needle);
}
return 0;
}
It was suitably modified to test both implementations. I notice as I am typing this up I left in the init_stristr
call, but it shouldn't change things too much. bench
is just a simple shell script:
#!/bin/bash
function bc_calc()
{
echo $(echo "scale=4;$1" | bc)
}
time="/usr/bin/time -p"
prog="$1"
accum=0
runs=10
for a in $(jot $runs 1 $runs)
do
echo -n "run $a... "
t=$($time $prog 2>&1| grep user | awk '{print $2}')
echo "time = $t"
accum=$(bc_calc "$accum+$t")
done
echo -n "average user time over $runs runs = "
echo $(bc_calc "$accum/$runs")
Assuming both input strings are already lowercase.
int StringInStringFindFirst(const char* p_cText, const char* p_cSearchText)
{
int iTextSize = strlen(p_cText);
int iSearchTextSize = strlen(p_cSearchText);
char* p_cFound = NULL;
if(iTextSize >= iSearchTextSize)
{
int iCounter = 0;
while((iCounter + iSearchTextSize) <= iTextSize)
{
if(memcmp( (p_cText + iCounter), p_cSearchText, iSearchTextSize) == 0)
return iCounter;
iCounter ++;
}
}
return -1;
}
You could also, try using masks... if for example most of the strings you are going to compare only contains chars from a to z, maybe it's worth to do something like this.
long GetStringMask(const char* p_cText)
{
long lMask=0;
while(*p_cText != '\0')
{
if (*p_cText>='a' && *p_cText<='z')
lMask = lMask | (1 << (*p_cText - 'a') );
else if(*p_cText != ' ')
{
lMask = 0;
break;
}
p_cText ++;
}
return lMask;
}
Then...
int main(int argc, char* argv[])
{
char* p_cText = "this is a test";
char* p_cSearchText = "test";
long lTextMask = GetStringMask(p_cText);
long lSearchMask = GetStringMask(p_cSearchText);
int iFoundAt = -1;
// If Both masks are Valid
if(lTextMask != 0 && lSearchMask != 0)
{
if((lTextMask & lSearchMask) == lSearchMask)
{
iFoundAt = StringInStringFindFirst(p_cText, p_cSearchText);
}
}
else
{
iFoundAt = StringInStringFindFirst(p_cText, p_cSearchText);
}
return 0;
}
The kinds of programmer you do not want to have the misfortune to work with, write comments like these:
- My personal strstr() implementation that beats most other algorithms. * Until someone tells me otherwise, I assume that this is the * fastest implementation of strstr() in C. * I deliberately chose not to comment it. You should have at least * as much fun trying to understand it, as I had to write it :-).
As someone always told me, "Green lines are your friend" !
( Unless you messed with your IDE and comments aren't green :) )
use boost string algo. It is available, cross platform, and only a header file (no library to link in). Not to mention that you should be using boost anyway.
#include <boost/algorithm/string/find.hpp>
const char* istrstr( const char* haystack, const char* needle )
{
using namespace boost;
iterator_range<char*> result = ifind_first( haystack, needle );
if( result ) return result.begin();
return NULL;
}
If you can control the needle string so that it is always in lower case, then you can write a modified version of stristr() to avoid the lookups for that, and thus speed up the code. It isn't as general, but it can be faster - slightly faster. Similar comments apply to the haystack, but you are more likely to be reading the haystack from sources outside your control for you cannot be certain that the data meets the requirement.
Whether the gain in performance is worth it is another question altogether. For 99% of applications, the answer is "No, it is not worth it". Your application might be one of the tiny minority where it matters. More likely, it is not.