views:

3566

answers:

5

I have two or more arrays -- one with IDs, one or more with string values. I want to merge these into a hash table so I can look up values by ID.

The following function does the job, but a shorter and sweeter version (LINQ?) would be nice:

Dictionary<int, string[]> MergeArrays( IEnumerable<int> idCollection,
                                       params IEnumerable<string>[] valueCollections )
{
 var dict = new Dictionary<int, string[]>();

 var idL = idCollection.Count();
 while ( idL-- > 0 )
 {
  dict[idCollection.ElementAt( idL )] = new string[valueCollections.Length];

  var vL = valueCollections.Length;
  while ( vL-- > 0 )
   dict[idCollection.ElementAt( idL )][vL] = valueCollections[vL].ElementAt( idL );
 }

 return dict;
}

Any ideas?

+3  A: 

That's very inefficent at the moment - all those calls to ElementAt could be going through the whole sequence (as far as they need to) each time. (It depends on the implementation of the sequence.)

However, I'm not at all sure I even understand what this code is doing (using foreach loops would almost certainly make it clearer, as would iterating forwards instead of backwards. Could you give some sample input? and expected outputs?

EDIT: Okay, I think I see what's going on here; you're effectively pivoting valueCollections. I suspect you'll want something like:

static Dictionary<int, string[]> MergeArrays(
    IEnumerable<int> idCollection,
    params IEnumerable<string>[] valueCollections)
{
    var valueCollectionArrays = valueCollections.Select
         (x => x.ToArray()).ToArray();
    var indexedIds = idCollection.Select((Id, Index) => new { Index, Id });

    return indexedIds.ToDictionary(x => Id, 
        x => valueCollectionArrays.Select(array => array[x.Index]).ToArray());
}

It's pretty ugly though. If you can make idCollection an array to start with, it would frankly be easier.

EDIT: Okay, assuming we can use arrays instead:

static Dictionary<int, string[]> MergeArrays(
    int[] idCollection,
    params string[][] valueCollections)
{
    var ret = new Dictionary<int, string[]>();
    for (int i=0; i < idCollection.Length; i++)
    {
         ret[idCollection[i]] = valueCollections.Select
             (array => array[i]).ToArray();
    }
    return ret;
}

I've corrected (hopefully) a bug in the first version - I was getting confused between which bit of the values was an array and which wasn't. The second version isn't as declarative, but I think it's clearer, personally.

Jon Skeet
> all those calls to ElementAt will be going through the whole sequenceThat depends on what is implementing the IEnumerable. If its actually an array it will be very fast. If you use something like a linked list it is inefficient.
Dan Fish
@Dan: I wasn't aware that Enumerable.ElementAt checked for an implementation of IList<T>. That makes a big difference. Will edit.
Jon Skeet
In my case, both idCollection and the valueCollections are arrays. I just tried to make it more generic -- a classic example of pointlessly trying to anticipate future needs. These are arrays of form data that I need to join, and they'll never be anything but arrays.
Bergius
Not the beautiful LINQ expression I was imagining, but easily the most performant option. Your first version is half as fast, but perhaps more versatile. Accepted answer, anyway.
Bergius
@Bergius: It's the pivoting which causes the problem, really. With extra extension methods it may well be more feasible (and I'll bear that in mind for another project) but I think the second option is more pragmatic. LINQ isn't always the best way to go :)
Jon Skeet
I know, but it's a lot of fun! :)
Bergius
A: 

Unless I'm missing something, you don't need the code to duplicate the arrays - they're objects in their own right and aren't going to disappear unexpectedly.

Use Select() to combine the values together using an anonymous class, then ToDictionary() to bottle them up.

Try this:

    IDictionary<int, IEnumerable<string>> MergeArrays2(
        IEnumerable<int> idCollection,
        params IEnumerable<string>[] valueCollections)
    {
        var values = valueCollections.ToList();
        return idCollection.Select(
            (id, index) => new { Key = id, Value = values[index] })
            .ToDictionary(x => x.Key, x => x.Value);
    }

Note the return type uses IEnumerable instead of string[] - IMHO you'll find this more flexible than using arrays.

Also, the return type is using an Interface.

Updated: John Skeet made a good point (see comment below) that the arrays are mutable, and this might be an issue. See below for an easy change to create fresh arrays:

    IDictionary<int, IEnumerable<string>> MergeArrays2(
        IEnumerable<int> idCollection,
        params IEnumerable<string>[] valueCollections)
    {
        var values = valueCollections.ToList();
        return idCollection.Select(
            (id, index) => new 
            { 
                Key = id, 
                Value = values[index].ToArray()  // Make new array with values
            })
            .ToDictionary(x => x.Key, x => x.Value);
    }
Bevan
There are two reasons to "recreate" the arrays - the first is that it's not actually taking the existing arrays, it's creating an array by taking an element from *each* array. However, even if it weren't, they could be *mutated* later. May or may not be an issue.
Jon Skeet
+2  A: 

How about:

        public static Dictionary<int, string[]> MergeArrays2(IEnumerable<int> idCollection,
        params IEnumerable<string>[] valueCollections)
    {
        var dict = new Dictionary<int, string[]>();
        var valEnums = (from v in valueCollections select v.GetEnumerator()).ToList();
        foreach (int id in idCollection)
        {
            var strings = new List<string>();
            foreach (var e in valEnums)
                if (e.MoveNext())
                    strings.Add(e.Current);
            dict.Add(id, strings.ToArray());
        }
        return dict;
    }

or slightly editing the skeet answer (it didn't work for me):

        static Dictionary<int, string[]> MergeArrays_Skeet(IEnumerable<int> idCollection,params IEnumerable<string>[] valueCollections)
    {
        var valueCollectionArrays = valueCollections.Select(x=>x.ToArray()).ToArray();
        var indexedIds = idCollection.Select((Id, Index) => new { Index, Id });
        return indexedIds.ToDictionary(x => x.Id,x => valueCollectionArrays.Select(array => array[x.Index]).ToArray());
    }
Dan Fish
+1  A: 

Here's some elegance. It's longer than I like, but it's very explorable.

    public Dictionary<int, string[]> MergeArrays(
        IEnumerable<int> idCollection,
        params IEnumerable<string>[] valueCollections
            )
    {
        Dictionary<int, int> ids = idCollection
            .ToDictionaryByIndex();
        //
        Dictionary<int, List<string>> values =
            valueCollections.Select(x => x.ToList())
            .ToList()
            .Pivot()
            .ToDictionaryByIndex();
        //
        Dictionary<int, string[]> result =
            ids.ToDictionary(
                z => z.Value,
                z => values[z.Key].ToArray()
            );

        return result;
    }

And here's the helper methods that I used.

   public static List<List<T>> Pivot<T>
        (this List<List<T>> source)
    {
        return source
            .SelectMany((it) =>
              it.Select((t, i) => new { t, i })
            )
            .GroupBy(z => z.i)
            .Select(g => g.Select(z => z.t).ToList())
            .ToList();
    }

    public static Dictionary<int, T> ToDictionaryByIndex<T>
        (this IEnumerable<T> source)
    {
        return source
            .Select((t, i) => new { t, i })
            .ToDictionary(z => z.i, z => z.t);
    }

Disclaimer: if you call Pivot with a non-rectangular structure, I don't know/care what will happen.

David B
A: 

This is not quite as general as you asked for but it joins two enumerables into an enumerable of pairs. It is pretty trivial to turn that result enumerable into a dictionary.

public static class ExtensionMethods
{
 public static IEnumerable<Pair<TOuter, TInner>> InnerPair<TInner, TOuter>(this IEnumerable<TOuter> master,
                   IEnumerable<TInner> minor)
 {
  if (master == null)
   throw new ArgumentNullException("master");
  if (minor == null)
   throw new ArgumentNullException("minor");
  return InnerPairIterator(master, minor);
 }

 public static IEnumerable<Pair<TOuter, TInner>> InnerPairIterator<TOuter, TInner>(IEnumerable<TOuter> master,
                     IEnumerable<TInner> minor)
 {
  IEnumerator<TOuter> imaster = master.GetEnumerator();
  IEnumerator<TInner> iminor = minor.GetEnumerator();
  while (imaster.MoveNext() && iminor.MoveNext())
  {
   yield return
    new Pair<TOuter, TInner> { First = imaster.Current, Second = iminor.Current };
  }
 }
}


public class Pair<TFirst, TSecond>
{
 public TFirst First { get; set; }
 public TSecond Second { get; set; }
}
Timbo