views:

155

answers:

2

How do I check if a string is a substring of another string in C without using the substr function?

+5  A: 

You could use the rolling hash method (Rabin-Karp), or implement KMP, which is a bit like a state machine.

Generally, Rabin-Karp is used when you are searching for multiple target strings, because the second check it requires (for actual equality rather than just hash equality) is well worth it to be able to examine the source string only once. In your case, either will do just fine.

There is also a more naive solution which is O(n^2) and requires checking for the target string at every position in the source string.

danben
… both of which would be inferior to `strstr`
Potatoswatter
I assumed he meant he needed to implement a solution. But for argument's sake, how is `strstr` implemented?
danben
There's also Boyer-Moore (http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm), which is generally faster than KMP. But yeah, @Potatoswatter, why do you think `strstr` is so awesome? I'm pretty sure the standard doesn't say anything about the implementation or its efficiency.
Laurence Gonsalves
@Laurence: because the task is so simple. The only way to improve on the code I posted would be to change the `strchr` to search for the least frequent character. Which is a simple alteration, but highly specific by presuming you know beforehand what that character is.
Potatoswatter
This is a bit like saying that binary search couldn't possibly improve on linear search because linear search is so simple.
danben
Except, this is by definition a linear search. Linear searches always take O(N) time.
Potatoswatter
+1  A: 

Don't use substr… use strstr/strnstr!

Seriously, nothing else will be as fast unless it does exactly the same thing. Why don't you want that?

Edit: yeah, there are algorithms with better asymptotic performance. For text-like searches for a single string, I'd be pretty surprised to see anything outperform a standard library, though.

Potatoswatter
I am going to challenge your claim that this is faster than KMP (or Boyer-Moore, as @Laurence Gonsalves pointed out). This is the naive n^2 solution.
danben
It's O( strlen(s) * strlen(sub) ), which isn't anything-squared… you can improve it by coding `strcmp` to return early, which doesn't improve anything big-O but is much faster in practice.
Potatoswatter
"It's O( strlen(s) * strlen(sub) ), which isn't anything-squared". One of those has to be larger than the other, right?
danben
@danben: One may be a constant, or equivalently may have an upper bound.
Potatoswatter
No, `strlen(s)` for an arbitrary `s` is not a constant value. And there is no upper bound on string lengths when the entirety of the problem statement is "check that a string is a substring of another string". Even with a `strcmp` that fails fast, this is O(n^2) in the worst case, and possibly in the average case as well.
danben
@danben: Fair nuff, edited. For things like genetics, you could make strlen hurt.
Potatoswatter