tags:

views:

148

answers:

3

I need to search a large string for a particular substring. The substring will start with Testing= but everything within the double quotes could be different because its a user login.

So examples of the substring I need are

Testing="network\smithj"  

or

Testing="network\rodgersm"  

Do my requirements make sense? How can I do this in C#?

+10  A: 

This is a great application of a regular expression.

"Testing=\"[^\"]*\""

You will use it like so:

Regex reg = new Regex("Testing=\"[^\"]*\"");
string login = reg.Match(yourInputString).Groups[0].Value;

The above works with your two given test cases.

Wikipedia has a great article on Regular Expressions if you are not familiar with them. And if you search google you can find a wealth of info on how to use Regular Expressions in C#.

jjnguy
@0xA3 thanks for the fix.
jjnguy
+6  A: 

Something like:

const string Prefix = "Testing=\"";

static string FindTestingSubstring(string text)
{
    int start = text.IndexOf(Prefix);
    if (start == -1)
    {
        return null; // Or throw an exception
    }
    int end = text.IndexOf('\"', start + Prefix.Length);
    if (end == -1)
    {
        return null; // Or throw an exception
    }
    return text.Substring(start + Prefix.Length, end - start - Prefix.Length);
}

An alternative is to use a regular expression - but when the pattern is reasonably simple, I personally prefer simple string manipulation. It depends on how comfortable you are with regexes though :)

Jon Skeet
You have to watch out for those awkward cases, such as "I said, \"Hello, World!\"".
Rafe
@Rafe: Yes, but there are far fewer of those than there are special regex characters - and in most cases the compiler will tell you about them, whereas the compiler doesn't know about regular expressions.
Jon Skeet
Hi Jon, I'm not sure I follow you here. What I was saying is that after you compute text.IndexOf('\"'), you then need to see whether the double quotes you found are preceeded by an odd number of backslashes in order to avoid matching literal quotes instead of the closing quotes. This is a place where regexes really would be helpful, since your search string now is of the order of one character long.
Rafe
@Rafe: No you don't. The backslashes aren't in the string itself - they're only in the C# source code representation of the string. If this were a problem when using IndexOf, it would be a problem with regular expressions too. `text.IndexOf("\"")` *just* finds the first double-quote in the string. That's all we *need* to do, as far as we know from the question. If there are backslashes in the string, they're irrelevant - and they won't be found by that call, because the string we pass to `IndexOF` doesn't actually contain any backslashes.
Jon Skeet
Hi Jon, I can't see in the question where it implies the value attached to 'Testing' can't itself contain literal double quotes. If you're right, then I stand corrected :-)
Rafe
@Rafe: Well even if it *did*, it clearly isn't using backslash to escape them, given that there *are* backslashes in the sample data. Also if it did, the regex solution doesn't help either... where in your solution does it try to avoid escaped quotes? The point is that either both solutions would have to handle it, or neither of them would.
Jon Skeet
A: 

If the string you're searching is very large, you might not want to use regular expressions. Regexes are relatively slow at matching and will typically examine every character. Instead, lookup the Boyer-Moore string matching algorithm, which typically examines only a fraction of the characters. The CLR implementation for string.IndexOf(string) may or may not use this algorithm - you'd have to check.

Ah, here's a useful link with some benchmark results: http://www.arstdesign.com/articles/fastsearch.html

Rafe