views:

897

answers:

7

In implementing a basic Scheme interpreter in C# I discovered, to my horror, the following problem:

IEnumerator doesn't have a clone method! (or more precisely, IEnumerable can't provide me with a "cloneable" enumerator).

What I'd like:

interface IEnumerator<T>
{
    bool MoveNext();
    T Current { get; }
    void Reset();
    // NEW!
    IEnumerator<T> Clone();
}

I cannot come up with an implementation of IEnumerable that would not be able to supply an efficiently cloneable IEnumerator (vectors, linked lists, etc. all would be able to provide a trivial implementation of IEnumerator's Clone() as specified above... it would be easier than providing a Reset() method anyway!).

The absence of the Clone method means that any functional/recursive idiom of enumerating over a sequence won't work.

It also means I can't "seamlessly" make IEnumerable's behave like Lisp "lists" (for which you use car/cdr to enumerate recursively). i.e. the only implemention of "(cdr some IEnumerable)" would be woefully inefficient.

Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?

Can anyone suggest a workaround?

+3  A: 

As a workaround, you could easily make an extension method for IEnumerator which did your cloning. Just create a list from the enumerator, and use the elements as members.

You'd lose the streaming capabilities of an enumerator, though - since you're new "clone" would cause the first enumerator to fully evaluate.

Reed Copsey
best, most useful answer so far... but still would like an example of why they couldn't make IEnumerator cloneable... (highlighted the bit in bold in my question).
Paul Hollingsworth
As I mentioned - any case where you're taking advantage of the streaming (deferred execution) of an IEnumerator would break or at least lose the advantages, as soon as you cloned the enumerator. It's more of a matter of the costs vs. benefits of providing cloning functionality.
Reed Copsey
yeah no I realise the cost in your solution.. the point is, is there any realistic *implementation* if IEnumerable/IEnumerator that wouldn't be able to provide an efficient Clone method...?
Paul Hollingsworth
Infinite (or unbounded repeating) sequences will break this workaround. As will sequences that read from a potentially infinite stream of input.
Daniel Earwicker
@Earwicker: Good point about infinite enumerables.
Reed Copsey
+1  A: 

Why not this as an extension method:

public static IEnumerator<T> Clone(this IEnumerator<T> original)
{
    foreach(var v in original)
        yield return v;
}

This would basically create and return a new enumerator without fully evaluating the original.

Edit: Yep, I misread. Paul is correct, this would only work with IEnumerable.

That won't work - you can't foreach over an IEnumerator, right? Only over an IEnumerable...
Paul Hollingsworth
This code won't even compile, you can't foreach over IEnumerator<T> original and you can't use yield to return IEnumerator<T>. You probably meant IEnumerable<T> in both cases.
Lucas
A: 

There already is a way to create a new enumerator -- the same way you created the first one: IEnumerable.GetEnumerator. I'm not sure why you need another mechanism to do the same thing.

And in the spirit of the DRY principle, I'm curious as to why you would want the responsibility for creating new IEnumerator instances to be duplicated in both your enumerable and your enumerator classes. You would be forcing the enumerator to maintain additional state beyond what's required.

For example, imagine an enumerator for a linked list. For the basic implementation of IEnumerable, that class would only need to keep a reference to the current node. But to support your Clone, it would also need to keep a reference to the head of the list -- something it otherwise has no use for*. Why would you add that extra state to the enumerator, when you can just go to the source (the IEnumerable) and get another enumerator?

And why would you double the number of code paths you need to test? Every time you make a new way to manufacture an object, you're adding complexity.

* You would also need the head pointer if you implemented Reset, but according to the docs, Reset is only there for COM interop, and you're free to throw a NotSupportedException.

Joe White
A Cloned enumerator would not point at the first item in the sequence. It would point to the current item of the original enumerator. The purpose would be to be able to remember the current position.
Daniel Earwicker
Some enumerators could clone their current position, but many could not. Forward-only database cursors, for example; or anything that reads from a network stream or pipe. Even something that read from a local resource like a file stream would take a lot of work to make cloneable, since a file handle only has one current position.
Joe White
+14  A: 

The logic is inexorable! IEnumerable doesn't support Clone, and you need Clone, so you shouldn't be using IEnumerable.

Or more accurately, you shouldn't be using it as the fundamental basis for work on a Scheme interpreter. Why not make a trivial immutable linked list instead?

public class Link<TValue>
{
    private readonly TValue value;
    private readonly Link<TValue> next;

    public Link(TValue value, Link<TValue> next)
    {
        this.value = value;
        this.next = next;
    } 

    public TValue Value 
    { 
        get { return value; }
    }

    public Link<TValue> Next 
    {
        get { return next; }
    }

    public IEnumerable<TValue> ToEnumerable()
    {
        for (Link<TValue> v = this; v != null; v = v.next)
            yield return v.value;
    }
}

Note that the ToEnumerable method gives you convenient usage in the standard C# way.

To answer your question:

Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?

An IEnumerable can go anywhere in the world for its data. Here's an example that reads lines from the console:

IEnumerable<string> GetConsoleLines()
{
    for (; ;)
        yield return Console.ReadLine();
}

There are two problems with this: firstly, a Clone function would not be particularly straightforward to write (and Reset would be meaningless). Secondly, the sequence is infinite - which is perfectly allowable. Sequences are lazy.

Another example:

IEnumerable<int> GetIntegers()
{
    for (int n = 0; ; n++)
        yield return n;
}

For both these examples, the "workaround" you've accepted would not be much use, because it would just exhaust the available memory or hang up forever. But these are perfectly valid examples of sequences.

To understand C# and F# sequences, you need to look at lists in Haskell, not lists in Scheme.

In case you think the infinite stuff is a red herring, how about reading the bytes from a socket:

IEnumerable<byte> GetSocketBytes(Socket s)
{
    byte[] buffer = new bytes[100];
    for (;;)
    {
        int r = s.Receive(buffer);
        if (r == 0)
            yield break;

        for (int n = 0; n < r; n++)
            yield return buffer[n];       
    }
}

If there is some number of bytes being sent down the socket, this will not be an infinite sequence. And yet writing Clone for it would be very difficult. How would the compiler generate the IEnumerable implementation to do it automatically?

As soon as there was a Clone created, both instances would now have to work from a buffer system that they shared. It's possible, but in practice it isn't needed - this isn't how these kinds of sequences are designed to be used. You treat them purely "functionally", like values, applying filters to them recursively, rather than "imperatively" remembering a location within the sequence. It's a little cleaner than low-level car/cdr manipulation.

Further question:

I wonder, what's the lowest level "primitive(s)" I would need such that anything I might want to do with an IEnumerable in my Scheme interpreter could be implemented in scheme rather than as a builtin.

The short answer I think would be to look in Abelson and Sussman and in particular the part about streams. IEnumerable is a stream, not a list. And they describe how you need special versions of map, filter, accumulate, etc. to work with them. They also get onto the idea of unifying lists and streams in section 4.2.

Daniel Earwicker
Yeah, I know how to write a linked list. I'm trying to avoid making all C# code (particularly, any custom functions) having to convert all IEnumerables into linked lists up front. I realise that's not possible... but now I'm wondering *why* didn't they make IEnumerator cloneable?
Paul Hollingsworth
note: I up voted your answer anyway - thanks of the input
Paul Hollingsworth
See updated answer for the "why", also a reason why the "make a copy" workaround is not quite right.
Daniel Earwicker
OK... I suppose though they shouldn't have had to put Reset() on it if they weren't willing to put Clone() on it too.
Paul Hollingsworth
Reset was indeed a mistake - IEnumerable<T> is the best that could be done, but it's a compromise because it inherits from the "version one" non-generic IEnumerable, which had Reset.
Daniel Earwicker
Hmm interesting point regarding "car/cdr" being too low level... so, I wonder, what's the lowest level "primitive(s)" I would need such that anything I might want to do with an IEnumerable in my Scheme interpreter could be implemented in scheme rather than as a builtin...
Paul Hollingsworth
Good question, see update.
Daniel Earwicker
+1 for the best answer I've seen all day. :)
Greg D
A: 

This might help. It needs some code to call the Dispose() on the IEnumerator:

class Program
{
    static void Main(string[] args)
    {
        //var list = MyClass.DequeueAll().ToList();
        //var list2 = MyClass.DequeueAll().ToList();

        var clonable = MyClass.DequeueAll().ToClonable();


        var list = clonable.Clone().ToList();
        var list2 = clonable.Clone()ToList();
        var list3 = clonable.Clone()ToList();
    }
}

class MyClass
{
    static Queue<string> list = new Queue<string>();

    static MyClass()
    {
        list.Enqueue("one");
        list.Enqueue("two");
        list.Enqueue("three");
        list.Enqueue("four");
        list.Enqueue("five");
    }

    public static IEnumerable<string> DequeueAll()
    {
        while (list.Count > 0)
            yield return list.Dequeue();
    }
}

static class Extensions
{
    public static IClonableEnumerable<T> ToClonable<T>(this IEnumerable<T> e)
    {
        return new ClonableEnumerable<T>(e);
    }
}

class ClonableEnumerable<T> : IClonableEnumerable<T>
{
    List<T> items = new List<T>();
    IEnumerator<T> underlying;

    public ClonableEnumerable(IEnumerable<T> underlying)
    {
        this.underlying = underlying.GetEnumerator();
    }

    public IEnumerator<T> GetEnumerator()
    {
        return new ClonableEnumerator<T>(this);
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

    private object GetPosition(int position)
    {
        if (HasPosition(position))
            return items[position];

        throw new IndexOutOfRangeException();
    }

    private bool HasPosition(int position)
    {
        lock (this)
        {
            while (items.Count <= position)
            {
                if (underlying.MoveNext())
                {
                    items.Add(underlying.Current);
                }
                else
                {
                    return false;
                }
            }
        }

        return true;
    }

    public IClonableEnumerable<T> Clone()
    {
        return this;
    }


    class ClonableEnumerator<T> : IEnumerator<T>
    {
        ClonableEnumerable<T> enumerable;
        int position = -1;

        public ClonableEnumerator(ClonableEnumerable<T> enumerable)
        {
            this.enumerable = enumerable;
        }

        public T Current
        {
            get
            {
                if (position < 0)
                    throw new Exception();
                return (T)enumerable.GetPosition(position);
            }
        }

        public void Dispose()
        {
        }

        object IEnumerator.Current
        {
            get { return this.Current; }
        }

        public bool MoveNext()
        {
            if(enumerable.HasPosition(position + 1))
            {
                position++;
                return true;
            }
            return false;
        }

        public void Reset()
        {
            position = -1;
        }
    }


}

interface IClonableEnumerable<T> : IEnumerable<T>
{
    IClonableEnumerable<T> Clone();
}
taoufik
+2  A: 

If you can let the original enumerator go, ie. not use it any more, you can implement a "clone" function that takes the original enumerator, and uses it as the source for one or more enumerators.

In other words, you could build something like this:

IEnumerable<String> original = GetOriginalEnumerable();
IEnumerator<String>[] newOnes = original.GetEnumerator().AlmostClone(2);
                                                         ^- extension method
                                                         produce 2
                                                         new enumerators

These could internally share the original enumerator, and a linked list, to keep track of the enumerated values.

This would allow for:

  • Infinite sequences, as long as both enumerators progress forward (the linked list would be written such that once both enumerators have passed a specific point, those can be GC'ed)
  • Lazy enumeration, the first of the two enumerators that need a value that hasn't been retrieved from the original enumerator yet, it would obtain it and store it into the linked list before yielding it

Problem here is of course that it would still require a lot of memory if one of the enumerators move far ahead of the other one.

Here is the source code. If you use Subversion, you can download the Visual Studio 2008 solution file with a class library with the code below, as well as a separate unit test porject.

Repository: http://vkarlsen.serveftp.com:81/svnStackOverflow/SO847655
Username and password is both 'guest', without the quotes.

Note that this code is not thread-safe, at all.

public static class EnumeratorExtensions
{
    /// <summary>
    /// "Clones" the specified <see cref="IEnumerator{T}"/> by wrapping it inside N new
    /// <see cref="IEnumerator{T}"/> instances, each can be advanced separately.
    /// See remarks for more information.
    /// </summary>
    /// <typeparam name="T">
    /// The type of elements the <paramref name="enumerator"/> produces.
    /// </typeparam>
    /// <param name="enumerator">
    /// The <see cref="IEnumerator{T}"/> to "clone".
    /// </param>
    /// <param name="clones">
    /// The number of "clones" to produce.
    /// </param>
    /// <returns>
    /// An array of "cloned" <see cref="IEnumerator[T}"/> instances.
    /// </returns>
    /// <remarks>
    /// <para>The cloning process works by producing N new <see cref="IEnumerator{T}"/> instances.</para>
    /// <para>Each <see cref="IEnumerator{T}"/> instance can be advanced separately, over the same
    /// items.</para>
    /// <para>The original <paramref name="enumerator"/> will be lazily evaluated on demand.</para>
    /// <para>If one enumerator advances far beyond the others, the items it has produced will be kept
    /// in memory until all cloned enumerators advanced past them, or they are disposed of.</para>
    /// </remarks>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="enumerator"/> is <c>null</c>.</para>
    /// </exception>
    /// <exception cref="ArgumentOutOfRangeException">
    /// <para><paramref name="clones"/> is less than 2.</para>
    /// </exception>
    public static IEnumerator<T>[] Clone<T>(this IEnumerator<T> enumerator, Int32 clones)
    {
        #region Parameter Validation

        if (Object.ReferenceEquals(null, enumerator))
            throw new ArgumentNullException("enumerator");
        if (clones < 2)
            throw new ArgumentOutOfRangeException("clones");

        #endregion

        ClonedEnumerator<T>.EnumeratorWrapper wrapper = new ClonedEnumerator<T>.EnumeratorWrapper
        {
            Enumerator = enumerator,
            Clones = clones
        };
        ClonedEnumerator<T>.Node node = new ClonedEnumerator<T>.Node
        {
            Value = enumerator.Current,
            Next = null
        };

        IEnumerator<T>[] result = new IEnumerator<T>[clones];
        for (Int32 index = 0; index < clones; index++)
            result[index] = new ClonedEnumerator<T>(wrapper, node);
        return result;
    }
}

internal class ClonedEnumerator<T> : IEnumerator<T>, IDisposable
{
    public class EnumeratorWrapper
    {
        public Int32 Clones { get; set; }
        public IEnumerator<T> Enumerator { get; set; }
    }

    public class Node
    {
        public T Value { get; set; }
        public Node Next { get; set; }
    }

    private Node _Node;
    private EnumeratorWrapper _Enumerator;

    public ClonedEnumerator(EnumeratorWrapper enumerator, Node firstNode)
    {
        _Enumerator = enumerator;
        _Node = firstNode;
    }

    public void Dispose()
    {
        _Enumerator.Clones--;
        if (_Enumerator.Clones == 0)
        {
            _Enumerator.Enumerator.Dispose();
            _Enumerator.Enumerator = null;
        }
    }

    public T Current
    {
        get
        {
            return _Node.Value;
        }
    }

    Object System.Collections.IEnumerator.Current
    {
        get
        {
            return Current;
        }
    }

    public Boolean MoveNext()
    {
        if (_Node.Next != null)
        {
            _Node = _Node.Next;
            return true;
        }

        if (_Enumerator.Enumerator.MoveNext())
        {
            _Node.Next = new Node
            {
                Value = _Enumerator.Enumerator.Current,
                Next = null
            };
            _Node = _Node.Next;
            return true;
        }

        return false;
    }

    public void Reset()
    {
        throw new NotImplementedException();
    }
}
Lasse V. Karlsen
Clever... so it's kinda of like a lazily evaluated version of Reed Copsey's answer (so it only uses all elements of the list if you actually use it that way).
Paul Hollingsworth
Right, I'll post an example implementation a bit later, teaching a class at the moment.
Lasse V. Karlsen
Will be very interested to see this - I started writing up the same thing for my answer the other day, but never finished it.
Daniel Earwicker
Although see taoufik's answer.
Daniel Earwicker
Posted the source code. Some optimization techniques could be employed, for instance, instead of a linked list with only one value per node, each node could contain an array of items. This would complicate the design, somewhat, but would still be doable.
Lasse V. Karlsen
A: 

This uses reflection to create a new instance and then sets the values on the new instance. I also found this chapter from C# in Depth to be very useful. Iterator block implementation details: auto-generated state machines

static void Main()
{
    var counter = new CountingClass();
    var firstIterator = counter.CountingEnumerator();
    Console.WriteLine("First list");
    firstIterator.MoveNext();
    Console.WriteLine(firstIterator.Current);

    Console.WriteLine("First list cloned");
    var secondIterator = EnumeratorCloner.Clone(firstIterator);

    Console.WriteLine("Second list");
    secondIterator.MoveNext();
    Console.WriteLine(secondIterator.Current);
    secondIterator.MoveNext();
    Console.WriteLine(secondIterator.Current);
    secondIterator.MoveNext();
    Console.WriteLine(secondIterator.Current);

    Console.WriteLine("First list");
    firstIterator.MoveNext();
    Console.WriteLine(firstIterator.Current);
    firstIterator.MoveNext();
    Console.WriteLine(firstIterator.Current);
}

public class CountingClass
{
    public IEnumerator<int> CountingEnumerator()
    {
        int i = 1;
        while (true)
        {
            yield return i;
            i++;
        }
    }
}

public static class EnumeratorCloner
{
    public static T Clone<T>(T source) where T : class, IEnumerator
    {
        var sourceType = source.GetType().UnderlyingSystemType;
        var sourceTypeConstructor = sourceType.GetConstructor(new Type[] { typeof(Int32) });
        var newInstance = sourceTypeConstructor.Invoke(new object[] { -2 }) as T;

        var nonPublicFields = source.GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Instance);
        var publicFields = source.GetType().GetFields(BindingFlags.Public | BindingFlags.Instance);
        foreach (var field in nonPublicFields)
        {
            var value = field.GetValue(source);
            field.SetValue(newInstance, value);
        }
        foreach (var field in publicFields)
        {
            var value = field.GetValue(source);
            field.SetValue(newInstance, value);
        }
        return newInstance;
    }
}

This answer was also used on the following question Is it possible to clone an IEnumerable instance, saving a copy of the iteration state?

BenMaddox