views:

802

answers:

7

What is the most memory efficient way to search within a string in ANSI C? (put the code up)

An example where this is needed is in embedded devices that are very short on available memory but nowadays have reasonable clock cycles.

+10  A: 

It depends on what you're looking for...but strchr() or strstr() are often appropriate. And those are extremely memory efficient since they use no extra memory.

Jonathan Leffler
+2  A: 

If you are looking for a substring, then strstr is very memory efficient. and for a char, strchr is also very memory efficient. Neither need extra storage.

I'm not sure if there is anything more that you are looking for.

Evan Teran
+4  A: 

I suppose it depends on what you're searching for, but a linear search/comparison uses no more memory than the two strings (the 'host', and the 'token'). For example:

char host[] = "this is my string to search";
char token[] = "y st";
int k = 0;
while(host[k] != '\0'){
  for(int t=0; (token[t]!='\0' && host[k+t]!='\0');){
    if(host[k] == token[t]){
      t++;  // we matched the first char of token, so advance
    }
    else{   // no match yet, reset the token counter and move along the host string
      k++;
      t = 0;
    }
  }
  k++;
}

(I may be slightly off in the implementation, but hopefully you get my idea.)

Library functions like strstr should be worth looking at, too.

warren
if(host[k] == taken[t]){ t++; k++; } ...
nlucaroni
I knew someone'd have a shorter version than what I was coming-up with off the top of my head :)
warren
+5  A: 

Advancing one character at a time is Θ((n-m+1) m). Check into the Boyer-Moore and Knuth-Morris-Pratt algorithms for more efficient ways of searching for substrings -- both as low as O(n). Your handy algorithms textbook should discuss both of them. The standard C library strstr function implements one or both, so use that instead of rolling your own.

Barry Brown
Good pointers, but I'm looking for _memory_ efficiency, not complexity-related efficiency... That is only a secondary concern for me. If the processor has a small cache (say 2kB) you may not want to do _any_ memory operations and just use the registers the CPU provides.
Vasco Duarte
@Barry: are you sure strstr() ever implements BM or KMP searching? Those typically require a pre-computation to speed up the search, and it is seldom reasonable to do that directly - the brute force works better for many situations.
Jonathan Leffler
You're right; things may have changed since the last time my algorithms class. The strstr.c in glibc doesn't use BM or KMP, nor does it do the naive algorithm.
Barry Brown
What strstr() does depends on your compiler - Digital Mars uses BM, VC9 has a straight-forward set of nested while loops.
Michael Burr
@ Mike B: interesting feedback - thanks. I ran into a bug in an ancient version of strstr() on SunOS (pre-Solaris - that ancient!); it computed strlen() on the search string each iteration, which was critical since I was searching 64 KB blocks of text. Sun fixed it (maybe not because of me).
Jonathan Leffler
+1  A: 

Depending on the kind of search and the boundary conditions there is a large number of different algorithms for searching a sub-string in a string. A large collection is available here: http://www-igm.univ-mlv.fr/~lecroq/string/index.html

Nicola Bonelli
+1  A: 
paperhorse
A: 

I recently came across this problem and just want to share my thoughts.

"Memory efficient" as I interpreted it is the ability to search a long string of size M given only N amount of available memory, M > N. This is alternative to an efficient use of memory per character available in the string to search. And I feel may be more relevant to the original poster's embedded environment (that may have a large storage).

Regardless of which algorithm you use to do the comparison (the more efficient the better of course), I would choose to use a circular buffer (which should be greater than the string you're search for, at least 2X perhaps?) and continuously load the stream of characters to the buffer while the search algorithm advances. The search algorithm must be able to know how to wrap around the circular buffer (or add a level of indirection to hide the circular buffer from the search algorithm).