tags:

views:

152

answers:

3

I'm studying string searching algorithms now and wondering what algorithm is used for .NET String.Contains function for example. Reflector shows that this function is used but I have no idea what its name means.

private static extern int InternalFindNLSStringEx(IntPtr handle, string localeName, int flags, string source, int sourceCount, int startIndex, string target, int targetCount);
+2  A: 

Why using reflector when the source code is available? http://weblogs.asp.net/scottgu/archive/2008/01/16/net-framework-library-source-code-now-available.aspx

Stephane
That still wont show you any native/internal code, as this method does.
leppie
That's true, only the .Net code is available, erf
Stephane
I was trying to use the source code but it doesn't show me the algorithm. Am I doing something wrong or there is no way to see the real implementation?
Hun1Ahpu
+2  A: 

I’m guessing of course but I’d say that it’s just the naive string search implementation, having O(n*m) performance.

Now, this might be phenomenally wrong but the MSDN doesn’t specify the performance of this method so it’s not safe to assume a better performance.

Furthermore, most advanced pattern searching methods are quite specialized for certain string types and while there are better general-purpose search algorithms, implementing it in String.IndexOf is somewhat of an unnecessary optimization.

The reason is simple: if you require an efficient pattern search then you’ll implement your own anyway, custom-tailored to fit your particular data. So there’s no need to implement something fancy in the general-purpose library.

Konrad Rudolph
Thank you for reasonable answer.
Hun1Ahpu
I could not agree less; why would you implement your own general string search if there is already one in the library?Furthermore, string searching algorithms are exceptions since there is Boyer-Moore which is so nice (all pro no con) that is deemed standard benchmark. What I mean to say is that usually choosing an algorithm for a library can be really hard, but with strings I would not expect anything less to be used in any library worth anything. Please do correct me if I am wrong.
Unreason
@Unreason: in most applications it simply doesn’t matter. Using `Find` for anything but trivial tasks results in complicated logic and code bloat anyway. Better use regular expressions or a parser. Where you *do* need to use pattern finding and where it matters, you rarely deal with Unicode. Often it’s highly specialized like DNA sequences. And *of course* you don’t program that yourself – you use ready-made libraries. But these are specialized and don’t belong in a general framework like the .NET framework.
Konrad Rudolph
@Unreason: and Boyer-Moore depends critically on the size of the alphabet which is forbidding for Unicode. It’s possible to modify the algorithm to use hashing instead but already the parameters are getting murky. I think few people who work on a general framework like .NET know how to get this right (more efficient than the naive search in the general case and 100% bug-free) without making the implementation expensive.
Konrad Rudolph
(cont’d) Finally, Boyer-Moore is by no means the last word; Horspool is often (most of the time?) more efficient (not to mention easier to implement). I’m sure there are modern variants which tweak Horspool even more. And, without having performed any benchmarks, I’ll wager that for most applications (i.e. *relatively* small Unicode strings) naive search outperforms Horspool hands down.
Konrad Rudolph
+ for comment - makes sense :) For completeness sake here's another discussion that coincides with your reasoning http://yedda.com/questions/String_IndexOf_search_algorithm_net_1499176185184/
Unreason
A: 

I couldn't find that method stub, but my best guess is that it local-specific case-folding for case-insensitive searches.

leppie