views:

260

answers:

7

Possible Duplicate:
substring algorithm

Given two strings, A and B, how to find the first position of B in A? For instance, A = " ab123cdefgcde"; B= "cde" Then the first position of B in A is 5.

Is there any trick to solve this problem or just search A from the start?

+5  A: 

You really must to scan A from the start.

There are good algorithms of fast substring search, e.g. http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

There is also a standard function strstr:

strstr(A,B)

http://www.cplusplus.com/reference/clibrary/cstring/strstr/

osgx
Or `std::string::find` in the case of C++.
Hans W
A: 

The optimal way to solve this is by using the KMP algorithm: http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

IVlad
Is KMP trick really fastest for all cases? It is a bit outdated (1977). I think, there must be some faster tricks.
osgx
KMP gives the best worst-case performance as far as I know.
IVlad
+1  A: 

"Trick" is another word for algorithm, I guess. The most famous one is Knuth-Morris-Pratt.

unwind
A: 

In C programming you have function strstr to find the sub string position in the source String.

pavun_cool
A: 

If you have a big string and you're going to search for many substrings,
it's good to have in mind the suffix array structure.
Basically, you create an array of pointers and you sort it.
Then you can locate any substring with a binary search.

Nick D
A: 

The problem that you are solving is the "exact string matching" problem. The naive solution runs in O(n^2) time, but you can do much better than that. Some linear-time algorithms to solve this problem are Knuth-Morris-Pratt (KMP), Boyer-Moore, and Apostolico-Giancarlo. Another way to solve it is by constructing a finite state automaton that enters an accepting state when it sees the pattern string. The best possible solution is O(n), and all those have that worst-case running time. You do have to scan the string from one end to the other; however, it is possible to skip a fraction of the characters (which Boyer-Moore and Apostolico-Giancarlo will do), since some mismatches can imply other mismatches.

If you need to code this yourself, I recommend you go with the Knuth-Morris-Pratt algorithm, since it is a little bit more intuitive and easier to implement than the other solutions I've mentioned. Most programming languges, though, have an "indexOf" or "find" function that will solve this for you.

Michael Aaron Safyan