ansaurus

Question

What is the most efficient (read time) string search method? (C#)

Answer 1

+3 A:

I would try to use a Regular Expression instead of rolling my own string search algorithm. You can precompile the regular expression to make it run faster.

driis 2009-03-01 11:39:31

Answer 2

A:

You could use a regex; it’s optimized for this kind of searching and manipulation.

You could also try IndexOf ...

string result = string.Empty;

if (startPos >= response.Length) 
    return result;

int startingIndex = response.IndexOf(startMatchString, startPos);
int rightOfStartIndex = startingIndex + startMatchString.Length;

if (startingIndex > -1 && rightOfStartIndex < response.Length)
{
    int endingIndex = response.IndexOf(endMatchString, rightOfStartIndex);
    if (endingIndex > -1)
        result = response.Substring(rightOfStartIndex, endingIndex - rightOfStartIndex);
}

return result;

Nescio 2009-03-01 11:40:02

really? because I often hear that regex is considered inefficient

Harry 2009-03-01 11:41:06

Try it first and optimize later if needed.

Ed Swangren 2009-03-01 11:43:10

Answer 3

+6 A:

It depends on what you're trying to find in the string. If you're looking for a specific sequence IndexOf/Contains are fast, but if you're looking for wild card patterns Regex is optimized for this kind of search.

Brian Rasmussen 2009-03-01 11:43:13

The start and end matchstrings don't change. it is the string that lies between these two that is the point of interest.

Harry 2009-03-01 11:46:07

@Harry: Is it fixed at runtime, but unknown at compile time or do you want to match wild cards? For the latter Regex would be my choice. Try that and profile it to see if it will be fast enough.

Brian Rasmussen 2009-03-01 11:51:42

@Brian: the start and end matchstrings are fixed and known at compile time

Harry 2009-03-01 22:14:35

Answer 4

A:

For very long strings you cannot beat the boyer-moore search algorithm. It is more complex than I might try to explain here, but The CodeProject site has a pretty good article on it.

Khadaji 2009-03-01 11:58:48

Answer 5

A:

As said before regex is your friend. You might want to look at RegularExpressions.Group. This way you can name part of the matched resultset.

Here is an example

Ruben 2009-03-01 12:00:16

Answer 6

+7 A:

There are a whole bunch of algorithms,

boyer and moore
Sunday
Knuth-Morris-Pratt
Rabin-Karp

I would recommend to use the simplified Boyer-Moore, called Boyer–Moore–Horspool.

The C-code appears at the wikipedia. For the java code look at

http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.html

A nice article about these is available under http://www.ibm.com/developerworks/java/library/j-text-searching.html

If you want to use built-in stuff go for regular expressions.

Luixv 2009-03-01 12:28:56

Answer 7

+2 A:

You can use String.IndexOf, but make sure you use StringComparison.Ordinal or it may be one order of magnitude slower.

private string Search2(int startPos, string startMatchString, string endMatchString, string response) {
    int startMarch = response.IndexOf(startMatchString, startPos, StringComparison.Ordinal);
    if (startMarch != -1) {
        startMarch += startMatchString.Length;
        int endMatch = response.IndexOf(endMatchString, startMarch, StringComparison.Ordinal);
        if (endMatch != -1) { return response.Substring(startMarch, endMatch - startMarch); }
    }
    return string.Empty;
}

Searching 1000 times a string at about the 40% of a 183 KB file took about 270 milliseconds. Without StringComparison.Ordinal it took about 2000 milliseconds.
Searching 1 time with your method took over 60 seconds as it creates a new string (O(n)) each iteration, making your method O(n^2).

ggf31416 2009-03-01 13:59:41

Answer 8

A:

Here's an example using IndexOf (beware: written from the top of my head, didn't test it):

int skip = 11;
int start = response.IndexOf(startMatchString, startPos);
if (start >= 0)
{
   int end = response.IndexOf(startMatchString, start + skip);
   if (end >= 0)
       return response.Substring(start + skip, end - start - skip);
   else
       return response.Substring(start + skip);
}
return string.Empty;

oefe 2009-03-01 14:10:58

ansaurus

tags:

views:

answers:

What is the most efficient (read time) string search method? (C#)

related questions