ansaurus

Question

Find the first occurrence/starting index of the sub-array in C#.

Answer 1

+1 A:

The simplest way is probably this:

public static class ArrayExtensions
{
    private static bool isMatch(int[] x, int[] y, int index)
    {
        for (int j = 0; j < y.Length; ++j)
            if (x[j + index] != y[j]) return false;
        return true;
    }

    public static int IndexOf(this int[] x, int[] y)
    {
        for (int i = 0; i < x.Length - y.Length + 1; ++i)
            if (isMatch(x, y, i)) return i;
        return -1;
    }
}

But it's definitely not the fastest way.

Mark Byers 2009-11-22 23:58:28

Answer 2

+1 A:

I find something along the following lines more intuitive, but that may be a matter of taste.

public static class ArrayExtensions
{
    public static int StartingIndex(this int[] x, int[] y)
    {
        var xIndex = 0;
        while(xIndex < x.length)
        {
            var found = xIndex;
            var yIndex = 0;
            while(yIndex < y.length && xIndex < x.length && x[xIndex] == y[yIndex])
            {
                xIndex++;
                yIndex++;
            }

            if(yIndex == y.length-1)
            {
                return found;
            }

            xIndex = found + 1;
        }

        return -1;
    }
}

This code also addresses an issue I believe your implementation may have in cases like x = {3, 3, 7}, y = {3, 7}. I think what would happen with your code is that it matches the first number, then resets itself on the second, but starts matching again on the third, rather than stepping back to the index just after where it started matching. May be missing something, but it's definitely something to consider and should be easily fixable in your code.

Nikolas Stephan 2009-11-22 23:59:23

Your code suffers from the same problem as Jeffreys: it fails on new[] { 9, 8, 3 }.StartingIndex(new[] { 3, 4 }).

Mark Byers 2009-11-23 00:12:56

@Mark thanks for pointing out.

Jeffrey C 2009-11-23 00:14:46

Have resolved this issue by adding an extra clause to the inner while to check whether xIndex is still in range.

Nikolas Stephan 2009-11-23 09:48:02

Answer 3

+2 A:

Simplest to write?

    return (from i in Enumerable.Range(0, 1 + x.Length - y.Length)
            where x.Skip(i).Take(y.Length).SequenceEqual(y)
            select (int?)i).FirstOrDefault().GetValueOrDefault(-1);

Not quite as efficient, of course... a bit more like it:

private static bool IsSubArrayEqual(int[] x, int[] y, int start) {
    for (int i = 0; i < y.Length; i++) {
        if (x[start++] != y[i]) return false;
    }
    return true;
}
public static int StartingIndex(this int[] x, int[] y) {
    int max = 1 + x.Length - y.Length;
    for(int i = 0 ; i < max ; i++) {
        if(IsSubArrayEqual(x,y,i)) return i;
    }
    return -1;
}

Marc Gravell 2009-11-23 00:00:08

Answer 4

+2 A:

Here is a simple (yet fairly efficient) implementation that finds all occurances of the array, not just the first one:

static class ArrayExtensions {

  public static IEnumerable<int> StartingIndex(this int[] x, int[] y) {
    IEnumerable<int> index = Enumerable.Range(0, x.Length - y.Length + 1);
    for (int i = 0; i < y.Length; i++) {
      index = index.Where(n => x[n + i] == y[i]).ToArray();
    }
    return index;
  }

}

Example:

int[] x = { 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4 };
int[] y = { 2, 3 };
foreach (int i in x.StartingIndex(y)) {
  Console.WriteLine(i);
}

Output:

1
5
9

The method first loops through the x array to find all occurances of the first item in the y array, and place the index of those in the index array. Then it goes on to reduce the matches by checking which of those also match the second item in the y array. When all items in the y array is checked, the index array contains only the full matches.

Edit:
An alternative implementation would be to remove the ToArray call from the statement in the loop, making it just:

index = index.Where(n => x[n + i] == y[i]);

This would totally change how the method works. Instead of looping through the items level by level, it would return an enumerator with nested expressions, deferring the search to the time when the enumerator was iterated. That means that you could get only the first match if you wanted:

int index = x.StartingIndex(y).First();

This would not find all matches and then return the first, it would just search until the first was found and then return it.

Guffa 2009-11-23 00:00:50

@Guffa you seem to be pretty familiar with Enumerable, you've used similar approach in answering my other question http://stackoverflow.com/questions/1253454

Jeffrey C 2009-11-23 00:25:48

@Jeffrey: I added an explanation of the algorithm above.

Guffa 2009-11-23 00:34:53

This is a clever solution, although perhaps not the simplest.I also use "x.Length - y.Length + 1" in my solution. It's to avoid starting a search very close to the end of the x array, where there is no space for y. If you used x.Length instead, you'd get an index out of range exception.A deficiency with this method is that if you only want the first match it still has to search the entire string.

Mark Byers 2009-11-23 01:30:41

@Mark: I added an alternative approach above, that would solve the issue with getting only the first match.

Guffa 2009-11-23 03:05:11

Answer 5

+1 A:

"Simplest" and "fastest" are opposites in this case, and besides, in order to describe fast algorithms we need to know lots of things about how the source array and the search array are related to each other.

This is essentially the same problem as finding a substring inside a string. Suppose you are looking for "fox" in "the quick brown fox jumps over the lazy dog". The naive string matching algorithm is extremely good in this case. If you are searching for "bananananananananananananananana" inside a million-character string that is of the form "banananananabanananabananabananabanananananbananana..." then the naive substring matching algorithm is terrible -- far faster results can be obtained by using more complex and sophisticated string matching algorithms. Basically, the naive algorithm is O(nm) where n and m are the lengths of the source and search strings. There are O(n+m) algorithms but they are far more complex.

Can you tell us more about the data you're searching? How big is it, how redundant is it, how long are the search arrays, and what is the likelihood of a bad match?

Eric Lippert 2009-11-23 00:23:39

I think you are making simple things complicate. We are not talking like Google search indexing here. But if you could would you able to explain at each level what the different implementations might be? Thanks.

Jeffrey C 2009-11-23 00:46:26

You're the one who posted the vague question; I don't know what size your data set is, what your application is, or what your performance requirements are. It's unreasonable of you to expect that I would. Furthermore, a 600 character comment is hardly the place to summarize the vast literature on efficient string searching algorithms; pick up a good university undergraduate textbook on algorithmic design and you'll get plenty of examples of different algorithms for substring matching.

Eric Lippert 2009-11-23 02:35:11

ansaurus

tags:

views:

answers:

Find the first occurrence/starting index of the sub-array in C#.

related questions