check is string is a substring of another string

views:

155

answers:

check is string is a substring of another string

How do I check if a string is a substring of another string in C without using the substr function?

+5 A:

You could use the rolling hash method (Rabin-Karp), or implement KMP, which is a bit like a state machine.

Generally, Rabin-Karp is used when you are searching for multiple target strings, because the second check it requires (for actual equality rather than just hash equality) is well worth it to be able to examine the source string only once. In your case, either will do just fine.

There is also a more naive solution which is O(n^2) and requires checking for the target string at every position in the source string.

danben 2010-02-19 04:33:45

… both of which would be inferior to `strstr`

Potatoswatter 2010-02-19 04:38:45

I assumed he meant he needed to implement a solution. But for argument's sake, how is `strstr` implemented?

danben 2010-02-19 04:40:48

There's also Boyer-Moore (http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm), which is generally faster than KMP. But yeah, @Potatoswatter, why do you think `strstr` is so awesome? I'm pretty sure the standard doesn't say anything about the implementation or its efficiency.

Laurence Gonsalves 2010-02-19 04:48:19

@Laurence: because the task is so simple. The only way to improve on the code I posted would be to change the `strchr` to search for the least frequent character. Which is a simple alteration, but highly specific by presuming you know beforehand what that character is.

Potatoswatter 2010-02-19 05:00:11

This is a bit like saying that binary search couldn't possibly improve on linear search because linear search is so simple.

danben 2010-02-19 05:04:57

Except, this is by definition a linear search. Linear searches always take O(N) time.

Potatoswatter 2010-02-19 05:19:03

+1 A:

Don't use substr… use strstr/strnstr!

~~Seriously, nothing else will be as fast unless it does exactly the same thing. Why don't you want that?~~

Edit: yeah, there are algorithms with better asymptotic performance. For text-like searches for a single string, I'd be pretty surprised to see anything outperform a standard library, though.

Potatoswatter 2010-02-19 04:40:22

I am going to challenge your claim that this is faster than KMP (or Boyer-Moore, as @Laurence Gonsalves pointed out). This is the naive n^2 solution.

danben 2010-02-19 05:01:40

It's O( strlen(s) * strlen(sub) ), which isn't anything-squared… you can improve it by coding `strcmp` to return early, which doesn't improve anything big-O but is much faster in practice.

Potatoswatter 2010-02-19 05:12:54

"It's O( strlen(s) * strlen(sub) ), which isn't anything-squared". One of those has to be larger than the other, right?

danben 2010-02-19 05:17:00

@danben: One may be a constant, or equivalently may have an upper bound.

Potatoswatter 2010-02-19 05:19:50

No, `strlen(s)` for an arbitrary `s` is not a constant value. And there is no upper bound on string lengths when the entirety of the problem statement is "check that a string is a substring of another string". Even with a `strcmp` that fails fast, this is O(n^2) in the worst case, and possibly in the average case as well.

danben 2010-02-19 05:28:31

@danben: Fair nuff, edited. For things like genetics, you could make strlen hurt.

Potatoswatter 2010-02-19 05:40:15

ansaurus

tags:

views:

answers:

check is string is a substring of another string

related questions