views:

407

answers:

5

So, I have a class with an array inside. Currently, my strategy for enumerating over the class's items is to use the code, foreach (item x in classInstance.InsideArray) . I would much rather use foreach (item x in classInstance) and make the array private. My main concern is that I really need to avoid anything slow; the array gets hit a lot (and has a couple hundred items). It is vital that enumerating over this array is cheap. One thought was to just have the class implement IEnumerable<item>, but InsideArray.getEnumerator() only gives me a non-generic enumerator. I also tried implementing the IEnumerable interface. This worked but was very slow, possibly due to boxing.

Is there a way to make the class itself enumerable without a performance hit?

Normal Code:

//Class
public class Foo {
    //Stuff
    public Item[,] InsideArray {get; private set;}
}

//Iteration.  Shows up all over the place
foreach (Item x in classInstance.InsideArray)
{
    //doStuff
}

Adjusted, much slower code:

//Class
public class Foo : IEnumerable {
    //Stuff
    private Item[,] InsideArray;
    System.Collections.IEnumerator System.Collections.IEnumerable GetEnumerator()
    {
        return InsideArray.GetEnumerator();
    }
}

//Iteration.  Shows up all over the place
foreach (Item x in classInstance)
{
    //doStuff
}

Note: Adding an implementation for the nongeneric iterator is possible and faster than my slow solution, but it is still a bit worse than just using the array directly. I was hoping there was a way to somehow tell C#, "hey, when I ask you to iterate over this object iterate over it's array, just as fast," but apparently that is not quite possible...at least from the answers suggested thus far.

A: 

How about adding an indexer to the class:

public MyInsideArrayType this[int index]
{
   get{return this.insideArray[index];
}

And if you REALLY need foreach capabilities:

public IEnumerable<MyInsideArrayType> GetEnumerator()
{
   for(int i = 0; i<this.insideArray.Count;i++)
   {
      yield return this[i];
   }
}
BFree
I'm already using an indexer, and the other suggestion you made is painfully slow.
Brian
Can't read your mind man, post your code and we'll know....
BFree
Brian: There's a difference between the non-generic IEnumerable solution you've shown in your edited question and the generic form shown in this answer.
Jon Skeet
@Jon: You are correct. And in fact this gives the same speed as Marc's answer, adding so much pain to my code. I'm too stubborn to give up those cycles, though.
Brian
+3  A: 

Cast your array to IEnumerable<item> before calling GetEnumerator() and you'll get the generic IEnumerator. For example:

string[] names = { "Jon", "Marc" };
IEnumerator<string> enumerable = ((IEnumerable<string>)names).GetEnumerator();

It may well still be a bit slower than enumerating the array directly with foreach (which the C# compiler does in a different way) but at least you won't have anything else in the way.

EDIT:

Okay, you said your other attempt used an indexer. You could try this approach, although I don't think it'll be any faster:

public IEnumerable<Item> Items
{
    get
    {
        foreach (Item x in items)
        {
            yield return x;
        }
    }
}

An alternative would be to try to avoid using a two-dimensional array to start with. Is that an absolute requirement? How often are you iterating over a single array after creating it? It may be worth taking a slight hit at creation time to make iteration cheaper.

EDIT: Another suggestion, which is slightly off the wall... instead of passing the iterator back to the caller, why not get the caller to say what to do with each item, using a delegate?

public void ForEachItem(Action action)
{
    foreach (Item item in items)
    {
        action(item);
    }
}

Downsides:

  • You incur the penalty of a delegate call on each access.
  • It's hard to break out of the loop (other than by throwing an exception). There are different ways of approaching this, but let's cross that bridge when we come to it.
  • Developers who aren't familiar with delegates may get a bit confused.
Jon Skeet
I omitted to mention that the inside array is two-dimensional.
Brian
I wasn't clear. I am using an indexer successfully, and it is not slow. But I don't want to use it with my foreach loop.
Brian
While I should be annoyed about making my inner array public, in all honesty I'm just as annoyed by seeing an unnecessary dot in my foreach loop.The delegate idea is very clever, but as often as not the looping is being used to read the items rather than perform actions, so it is not feasible for this specific program.
Brian
Reading *is* performing an action. For example, you could effectively copy the array using foo.ForEachItem(x => list.Add(x))
Jon Skeet
Please comment on my implementation of IEnumerable<Item> as well.
Jon Skeet
@Jon: Good point about actions. The added overhead of using a delegate is probably too high, though. And breaking out of the loop is rather common, though I suppose I could replace Action with a predicate and break out once it's true. That strikes me as making the code more confusing. As for IEnumerable<Item>, my issue with it is that I'd still have to have an extra dot in my foreach loop. Also, it runs about 5% slower.
Brian
To clarify, the whole program runs 5% slower because of the added cost of the loop.
Brian
Which dot are you talking about? Bear in mind that the iteration source is only evaluated once anyway. You seem to be bothered about that more than makes sense. The performance side is a more reasonable concern. Out of interest, is Item a class or a struct?
Jon Skeet
Item is currently a class. And your solution, though much cleaner, is a bit slower than Marc's (which is a bit slower than just foreach over the array.As an aside, this is a personal project for fun and the goal is not, "Make it fast enough for customers to be happy," but "Make it as fast as possible, but don't make the code unreadable...unless the speed-up is amazing."
Brian
A: 

All forms of iteration are cheap. If anyone in this day-and-age managed to somehow write and publish an expensive iterator they would be (rightly) burned at the stake.

Premature optimization is evil.

Cheers. Keith.

corlettk
It's not premature optimization at this point. My program is basically done, and all I've got left to do is optimization and minor cleanup. Using an enumerator myself ups the running time by 30% or so, since at this point all the program really does is iterate over arrays reading their data over and over again.
Brian
@Brian, 30% sounds like "a lot" to me, but I've never tried to measure the performance of the various iteration techniques over an empty loop in C#... My experience has been that much greater efficiencies are to be gained elsewhere, like using the right datastructure(s) for the access mode(s). My appologies for going off half cocked. No offence was intended. Cheers. Keith.
corlettk
30% *IS* a lot. However, keep in mind that during the iteration at the meatiest parts of the application, the program generally is just reading 1-2 bytes out of each element it iterates over and writing 1-2 bytes over a small fraction of those. The loops are *very* simple. Over time I've managed to cache a lot of the results of these loops but the loops will never go away completely.
Brian
A: 

If you can have the private array using some generic collection, you can simply implement IEnumerable in your class:

class SomeClass: IEnumerable<int>
{

    private List<int> _somePrivateData = new List<int>();

    public SomeClass()
    {
        _somePrivateData.AddRange(new int[] { 1, 4, 2, 3, 7, 6, 8, 34, 21, 45 });
    }

    IEnumerator<int> IEnumerable<int>.GetEnumerator()
    {
        return _somePrivateData.GetEnumerator();
    }

    IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return _somePrivateData.GetEnumerator();
    }

}

...then you can simply loop over the inside array like this:

SomeClass data = new SomeClass();
foreach (int item in data)
{
    // do something with the int
}

This way the inside array as such is not publicly visible, but can be iterated over.

Fredrik Mörk
This is exactly what I wanted to do, as my explanation indicates.
Brian
I think we will need to see some actual code (more than just fragments) from the class and how it is being used in order to be able to give you the help you look for. As BFree said; we can't read your mind.
Fredrik Mörk
@Fredrik: True, but I think my statement, "One thought was to just have the class implement IEnumerable<item>, but InsideArray.getEnumerator() only gives me a non-generic enumerator." was pretty clear. Jon offered a solution that didn't work to that issue (my fault for not specifying it was a multi-dimensional array).
Brian
Yes, it was actually rather clear; I should have read more carefully.
Fredrik Mörk
+3  A: 

A bespoke iterator might make it quicker (edited to return as known type):

Basic: 2468ms - -2049509440
Bespoke: 1087ms - -2049509440

(you would use the ArrayIterator directly as Foo's GetEnumerator - essentially copying the code from ArrayEnumerator.GetEnumerator; my point is to show that a typed iterator is faster than the interface)

With code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;

class Foo
{
    public struct ArrayIterator<T> : IEnumerator<T>
    {
        private int x, y;
        private readonly int width, height;
        private T[,] data;
        public ArrayIterator(T[,] data)
        {
            this.data = data;
            this.width = data.GetLength(0);
            this.height = data.GetLength(1);
            x = y = 0;
        }
        public void Dispose() { data = null; }
        public bool MoveNext()
        {
            if (++x >= width)
            {
                x = 0;
                y++;
            }
            return y < height;
        }
        public void Reset() { x = y = 0; }
        public T Current { get { return data[x, y]; } }
        object IEnumerator.Current { get { return data[x, y]; } }
    }
    public sealed class ArrayEnumerator<T> : IEnumerable<T>
    {
        private readonly T[,] arr;
        public ArrayEnumerator(T[,] arr) { this.arr = arr; }

        public ArrayIterator<T> GetEnumerator()
        {
            return new ArrayIterator<T>(arr);
        }

        System.Collections.Generic.IEnumerator<T> System.Collections.Generic.IEnumerable<T>.GetEnumerator()
        {
            return GetEnumerator();
        }
        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }

    }
    public int[,] data;

    public IEnumerable<int> Basic()
    {
        foreach (int i in data) yield return i;
    }
    public ArrayEnumerator<int> Bespoke()
    {
        return new ArrayEnumerator<int>(data);
    }
    public Foo()
    {
        data = new int[500, 500];
        for (int x = 0; x < 500; x++)
            for (int y = 0; y < 500; y++)
            {
                data[x, y] = x + y;
            }
    }
    static void Main()
    {
        Test(1); // for JIT
        Test(500); // for real
        Console.ReadKey(); // pause
    }
    static void Test(int count)
    {
        Foo foo = new Foo();
        int chk;
        Stopwatch watch = Stopwatch.StartNew();
        chk = 0;
        for (int i = 0; i < count; i++)
        {
            foreach (int j in foo.Basic())
            {
                chk += j;
            }
        }
        watch.Stop();
        Console.WriteLine("Basic: " + watch.ElapsedMilliseconds + "ms - " + chk);

        watch = Stopwatch.StartNew();
        chk = 0;
        for (int i = 0; i < count; i++)
        {
            foreach (int j in foo.Bespoke())
            {
                chk += j;
            }
        }
        watch.Stop();
        Console.WriteLine("Bespoke: " + watch.ElapsedMilliseconds + "ms - " + chk);
    }
}
Marc Gravell
This does skip the first element and does iterate in a different order, but both are easily fixable. As written, it ends up creating new iterators over and over again, which is unneeded in my case.--Overall, I'd say this fulfills my criteria better than any other solution in terms of removing the extra dot and running almost as fast as the original version. It adds a little more extra code than I'd like, but I don't think there's a better solution.--I suppose I'll just suck it up and live with the dot; I really don't want to add this much extra code and I'm stingy with my performance.
Brian
Sorry about the off-by-one etc - as you say, easy to fix. Re "creating new iterators over and over" - it only does this per call to `GetEnumerator()` - i.e. per `foreach`, and this is necessary to allow parallel iteration. But this doesn't have any significant object cost, as the iterator is a struct. The struct/iterator is a common pattern (see List<T> etc).
Marc Gravell
Fair enough. That said, it's still a bit slower than just enumerating over the array with a foreach. Not especially surprising, though. Oh well.
Brian