views:

166

answers:

5

Alright, so what i have, is a large string i need to parse, and what i need to happen, is find all the instances of extract"(me,i-have lots. of]punctuation, and store them to a list.

So say this piece of string was in the beginning and middle of the larger string, both of them would be found, and their index's would be added to the List. and the List would contain 0 and the other index whatever it would be.

Ive been playing around, and the string.IndexOf does almost what i'm looking for, and ive written some code. But i cant seem to get it to work:

            List<int> inst = new List<int>();
            int index = 0;
            while (index < source.LastIndexOf("extract\"(me,i-have lots. of]punctuation", 0) + 39)
            {
                int src = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index);
                inst.Add(src);
                index = src + 40;
            }
  • inst = The list
  • source = The large string

Any better ideas?

+6  A: 

I was bored so I wrote this extension method. It's untested but should give you a good start.

public static List<int> AllIndexesOf(this string str, string value) {
    if (String.IsNullOrEmpty(value))
        throw new ArgumentException("the string to find may not be empty", "value");
    List<int> indexes = new List<int>();
    for (int index = 0;; index += value.Length) {
        index = str.IndexOf(value, index);
        if (index == -1)
            return indexes;
        indexes.Add(index);
    }
}

If you put this into a static class and import it with using, it appears as a method on any string, and you can just do:

List<int> indexes = "fooStringfooBar".AllIndexesOf("foo");

For more information on extension methods, http://msdn.microsoft.com/en-us/library/bb383977.aspx

Edit: and the same using an iterator:

public static IEnumerable<int> AllIndexesOf(this string str, string value) {
    if (String.IsNullOrEmpty(value))
        throw new ArgumentException("the string to find may not be empty", "value");
    for (int index = 0;; index += value.Length) {
        index = str.IndexOf(value, index);
        if (index == -1)
            break;
        yield return index;
    }
}
Matti Virkkunen
Sorry if i sound stupid, but ive never really worked with extensions before, how would i go about using that / learning about them?
Tommy
Why not use IEnumerable<int> and yield return index instead of the indexes list?
m0sa
@m0sa: Good point. Added another version just for the fun of it.
Matti Virkkunen
A: 

Based on the code I've used for finding multiple instances of a string within a larger string, your code would look like:

List<int> inst = new List<int>();
int index = 0;
while (index >=0)
{
    index = source.IndexOf("extract\"(me,i-have lots. of]punctuation", index);
    inst.Add(index);
    index++;
}
Corin
A: 
public List<int> GetPositions(string source, string searchString)
{
    List<int> ret = new List<int>();
    int len = searchString.Length;
    int start = -len;
    while (true)
    {
        start = source.IndexOf(searchString, start + len);
        if (start == -1)
        {
            break;
        }
        else
        {
            ret.Add(start);
        }
    }
    return ret;
}

Call it like this:

List<int> list = GetPositions("bob is a chowder head bob bob sldfjl", "bob");
// list will contain 0, 22, 26
MusiGenesis
+1  A: 

Why don't you use the built in RegEx class:

public static IEnumerable<int> GetAllIndexes(this string source, string matchString)
{
   matchString = Regex.Escape(matchString);
   foreach (Match match in Regex.Matches(source, matchString))
   {
      yeild match.Index;
   }
   return indexes;
}

If you do need to reuse the expression then compile it and cache it somewhere. Change the matchString param to a Regex matchExpression in another overload for the reuse case.

csaam
+1  A: 

using LINQ

public static IEnumerable<int> IndexOfAll(this string sourceString, string subString)
        {
            return Regex.Matches(sourceString, subString).Cast<Match>().Select(m => m.Index);
        }
ehosca
You forgot to escape the subString though.
csaam
true ... true ...
ehosca