views:

1197

answers:

7

I have two arrays built while parsing a text file. The first contains the column names, the second contains the values from the current row. I need to iterate over both lists at once to build a map. Right now I have the following:

var currentValues = currentRow.Split(separatorChar);
var valueEnumerator = currentValues.GetEnumerator();

foreach (String column in columnList)
{
    valueEnumerator.MoveNext();
    valueMap.Add(column, (String)valueEnumerator.Current);
}

This works just fine, but it doesn't quite satisfy my sense of elegance, and it gets really hairy if the number of arrays is larger than two (as I have to do occasionally). Does anyone have another, terser idiom?

+2  A: 

If you're really using arrays, the best way is probably just to use the conventional for loop with indices. Not as nice, granted, but as far as I know .NET doesn't offer a better way of doing this.

You could also encapsulate your code into a method called zip – this is a common higher-order list function. However, C# lacking a suitable Tuple type, this is quite crufty. You'd end up returning an IEnumerable<KeyValuePair<T1, T2>> which isn't very nice.

By the way, are you really using IEnumerable instead of IEnumerable<T> or why do you cast the Current value?

Konrad Rudolph
+11  A: 

if there are the same number of column names as there are elements in each row, could you not use a for loop?

var currentValues = currentRow.Split(separatorChar);

for(var i=0;i<columnList.Length;i++){
   // use i to index both (or all) arrays and build your map
}
inkedmn
I would recommend a check to see if the two arrays length are the same before entering this loop.
James McMahon
I second nemo's comment.
epochwolf
while I didn't include that check in the actual code sample, I did say that at the beginning of my response ;)Point taken, though!
inkedmn
+2  A: 

Instead of creating two seperate arrays you could make a two-dimensional array, or a dictionary (which would be better). But really, if it works I wouldn't try to change it.

Jared
A: 

You could create a templated enumerator a la Pairenumerable. Personally I think that's overkill.

Brian
+12  A: 

You've got a non-obvious pseudo-bug in your initial code - IEnumerator<T> extends IDisposable so you should dispose it. This can be very important with iterator blocks! Not a problem for arrays, but would be with other IEnumerable<T> implementations.

I'd do it like this:

public static IEnumerable<TResult> PairUp<TFirst,TSecond,TResult>
    (this IEnumerable<TFirst> source, IEnumerable<TSecond> secondSequence,
     Func<TFirst,TSecond,TResult> projection)
{
    using (IEnumerator<TSecond> secondIter = secondSequence.GetEnumerator())
    {
        foreach (TFirst first in source)
        {
            if (!secondIter.MoveNext())
            {
                throw new ArgumentException
                    ("First sequence longer than second");
            }
            yield return projection(first, secondIter.Current);
        }
        if (secondIter.MoveNext())
        {
            throw new ArgumentException
                ("Second sequence longer than first");
        }
    }        
}

Then you can reuse this whenever you have the need:

foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar),
             (column, value) => new { column, value })
{
    // Do something
}

Alternatively you could create a generic Pair type, and get rid of the projection parameter in the PairUp method.

EDIT:

With the Pair type, the calling code would look like this:

foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar))
{
    // column = pair.First, value = pair.Second
}

That looks about as simple as you can get. Yes, you need to put the utility method somewhere, as reusable code. Hardly a problem in my view. Now for multiple arrays...

If the arrays are of different types, we have a problem. You can't express an arbitrary number of type parameters in a generic method/type declaration - you could write versions of PairUp for as many type parameters as you wanted, just like there are Action and Func delegates for up to 4 delegate parameters - but you can't make it arbitrary.

If the values will all be of the same type, however - and if you're happy to stick to arrays - it's easy. (Non-arrays is okay too, but you can't do the length checking ahead of time.) You could do this:

public static IEnumerable<T[]> Zip<T>(params T[][] sources)
{
    // (Insert error checking code here for null or empty sources parameter)

    int length = sources[0].Length;
    if (!sources.All(array => array.Length == length))
    {
        throw new ArgumentException("Arrays must all be of the same length");
    }

    for (int i=0; i < length; i++)
    {
        // Could do this bit with LINQ if you wanted
        T[] result = new T[sources.Length];
        for (int j=0; j < result.Length; j++)
        {
             result[j] = sources[j][i];
        }
        yield return result;
    }
}

Then the calling code would be:

foreach (var array in Zip(columns, row, whatevers))
{
    // column = array[0]
    // value = array[1]
    // whatever = array[2]
}

This involves a certain amount of copying, of course - you're creating an array each time. You could change that by introducing another type like this:

public struct Snapshot<T>
{
    readonly T[][] sources;
    readonly int index;

    public Snapshot(T[][] sources, int index)
    {
        this.sources = sources;
        this.index = index;
    }

    public T this[int element]
    {
        return sources[element][index];
    }
}

This would probably be regarded as overkill by most though ;)

I could keep coming up with all kinds of ideas, to be honest... but the basics are:

  • With a little bit of reusable work, you can make the calling code nicer
  • For arbitrary combinations of types you'll have to do each number of parameters (2, 3, 4...) separately due to the way generics works
  • If you're happy to use the same type for each part, you can do better
Jon Skeet
Downvoters - reasons please!
Jon Skeet
Wow, massive overkill and it still doesn't work with more then two arrays.
Hippiehunter
You're right that it doesn't work with more than two arrays - but it's elegant and highly reusable for two arrays. I think it may be in .NET 4.0 in fact - it was an operator MS had intended to include in LINQ...
Jon Skeet
(Or for any two sequences, I should say - not just arrays. And it streams the data, so it can work for potentially infinite data sources.)
Jon Skeet
its so elegant that it missed his rather specific use
Hippiehunter
+1 cause it answers OP question perfectly. Plus, I never would have come up with that in a million years
Jared
@Hippiehunter: In what way does it miss the original specific use? Yes, it doesn't cope with the case of more than two arrays, but other than that it's just what he needs, IMO. I'll address the "more than two arrays" in another edit in 20 minutes. (Putting kids to bed...)
Jon Skeet
+1: very cool. It looks overkill, but the cost of writing code like this pays for itself if the method is used in several places.
Juliet
I still think you've crafted the "Incredible Machine"
Hippiehunter
It looks overly complicated when a single for loop would probably work.
epochwolf
@epochwolf: If you're only going to use it once, then sure a for loop would be better. That could be said of everything in LINQ though. The point is the reusability. With the Pair type the calling code would be even simpler. Will add to my answer in a bit.
Jon Skeet
Right, now with lots of added stuff :)
Jon Skeet
This answer is supercool.
Ed Schwehm
I admire this for comprehensiveness and extensibility, so +1. But I gave the answer to the simple, retrospectively obvious solution :).
JSBangs
+1  A: 

Use IEnumerator for both would be nice

var currentValues = currentRow.Split(separatorChar);
using (IEnumerator<string> valueEnum = currentValues.GetEnumerator(), columnEnum = columnList.GetEnumerator()) {
    while (valueEnum.MoveNext() && columnEnum.MoveNext())
        valueMap.Add(columnEnum.Current, valueEnum.Current);
}

Or create an extension methods

public static IEnumerable<TResult> Zip<T1, T2, TResult>(this IEnumerable<T1> source, IEnumerable<T2> other, Func<T1, T2, TResult> selector) {
    using (IEnumerator<T1> sourceEnum = source.GetEnumerator()) {
        using (IEnumerator<T2> otherEnum = other.GetEnumerator()) {
            while (sourceEnum.MoveNext() && columnEnum.MoveNext())
                yield return selector(sourceEnum.Current, otherEnum.Current);
        }
    }
}

Usage

var currentValues = currentRow.Split(separatorChar);
foreach (var valueColumnPair in currentValues.Zip(columnList, (a, b) => new { Value = a, Column = b }) {
    valueMap.Add(valueColumnPair.Column, valueColumnPair.Value);
}
chaowman
+1  A: 

In a functional language you would usually find a "zip" function which will hopefully be part of a C#4.0 . Bart de Smet provides a funny implementation of zip based on existing LINQ functions:

public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
  this IEnumerable<TFirst> first, 
  IEnumerable<TSecond> second, 
  Func<TFirst, TSecond, TResult> func)
{
  return first.Select((x, i) => new { X = x, I = i })
    .Join(second.Select((x, i) => new { X = x, I = i }), 
    o => o.I, 
    i => i.I, 
    (o, i) => func(o.X, i.X));
}

Then you can do:

  int[] s1 = new [] { 1, 2, 3 };
  int[] s2 = new[] { 4, 5, 6 };
  var result = s1.Zip(s2, (i1, i2) => new {Value1 = i1, Value2 = i2});
flq