views:

1218

answers:

2

Hi All,

Given a DataSet, I'm left joining DataTables[1-n] onto DataTable[0]. I have created a method with a signature as follows:

public DataTable LeftJoin(DataSet ds, params JoinKey[] JoinKey)

Notes:

  • The return type is a DataTable, which is the resulting left join.
  • DataSet ds is a collection of DataTables for joining.
  • JoinKey is an object with two public properties: Type DataType and string Name. This collection contains the Type/Name of each field to use in the join.

Following are two snippets (Snippet 1 and Snippet 2) Snippet 1 works correctly. The only issue is that the second and third parameters of GroupJoin are hard-coded:

// hard-coded:
br => new                                  
{                                   
  zip = br.Field<string>("ZipCode"),                                   
  store =br.Field<double>"StoreID")
},                                 
jr => new                                 
{                                   
  zip = jr.Field<string>("ZipCode"),                                   
  store = jr.Field<double>("StoreID")                                 
}

The above is not desirable. Instead, I would like to use my "JoinKey" object to dynamically set the fields I want to join on (i.e., ZipCode and StoreID). I have attempted this in Snippet 2. First, however, please see Snippet 1.

Snippet 1 (working, hard-coded):

var dtBase = ds.Tables[0].AsEnumerable();  
for (int i = 1; i < ds.Tables.Count; i++)
{  
    var query = dtBase.GroupJoin(ds.Tables[i].AsEnumerable(),
                                 br => new 
                                 {
                                   zip = br.Field<string>("ZipCode"),
                                   store = br.Field<double>("StoreID")
                                 },
                                 jr => new
                                 {
                                   zip = jr.Field<string>("ZipCode"),
                                   store = jr.Field<double>("StoreID")
                                 },
                                 (baseRow, joinRow) => joinRow.DefaultIfEmpty()
                                     .Select(row => new
                                     {
                                         flatRow = baseRow.ItemArray.Concat((row == null) ? new object[ds.Tables[i].Columns.Count] : row.ItemArray).ToArray()
                                     })
                       ).SelectMany(s => s);  



   [... create a DataTable with the resulting left join, etc. ...]  
}

Note: the variable, "flatRow", stores an object array of left joined data; it is added to a DataRowCollection later in the method (not shown).

Snippet 2 (not working; no errors thrown, however):

var dtBase = ds.Tables[0].AsEnumerable();  
for (int i = 1; i < ds.Tables.Count; i++)
{  
    var query = dtBase.GroupJoin(ds.Tables[i].AsEnumerable(),
                                 or => KeySelector(or, JoinKey),  
                                 ir => KeySelector(ir, JoinKey),
                                 (baseRow, joinRows) => joinRows.DefaultIfEmpty()
                                     .Select(joinRow => new
                                     {
                                         flatRow = baseRow.ItemArray.Concat((joinRow == null) ? new object[ds.Tables[i].Columns.Count] : joinRow.ItemArray).ToArray()
                                     })
                                 )
                       .SelectMany(s => s);

    [... create a DataTable with the resulting left join, etc. ...]  
}

Here is the KeySelector function used above (please see the comments inline):

private IEnumerable KeySelector(DataRow dr, params JoinKey[] JoinKey)
{
  List<object> gl = new List<object>();
  foreach (JoinKey jk in JoinKey)
  {
    // note that I did try to mimic the 'hard-coded' approach from Snippet 1:
    // this does not work:
    // gl.Add(dr.Field<jk.DataType>(jk.Name)); --> it does not like <jk.DataType>

    // I 'hacked' around it by using the following:

    gl.Add(dr[dr.Table.Columns.IndexOf(jk.Name)]);
  }
  return gl;
}

Snippet 2 only returns data from DataTable[0]. None of the data from DataTable[1-n] (if any exists) is concatenated into the variable, "flatRow." Interestingly, though, I do get the correct number of elements in the array. I know this has something to do with the KeySelector method, but (obviously) I have no idea what the issue is.

Please let me know if anyone needs additional information. Your help is greatly appreciated...

Thanks,
Tyler

+1  A: 

One thing that immediately occurs is that using a double in a key is asking for trouble; equality comparisons on floating point numbers are notoriously flakey. I'm still looking, though.

I believe the main problem here is that there is simply no automatic equality on lists based on contents; you are returning different lists (from KeySelector), so they simply never equal eachother. You might be able to provide a custom comparer...

This is messy, but gets something returning:

    class SetComparer : IEqualityComparer<IEnumerable>
    {

        public readonly static SetComparer Default = new SetComparer();

        public bool Equals(IEnumerable x, IEnumerable y)
        {
            return Enumerable.SequenceEqual(x.Cast<object>(), y.Cast<object>());
        }

        public int GetHashCode(IEnumerable data)
        {
            int hash = 0;
            foreach (object obj in data)
            {
                if (obj != null)
                {
                    hash = hash * 7 + 13 * obj.GetHashCode();
                }
            }
            return hash;
        }
    }

And pass SetComparer.Default in as the final (optional) arg to GroupJoin.

Update: found my bug - I'd borked GetHashCode(); fixed.

You could probably also do something similar by building an expression at runtime for use as the comparer (and just returning the row itself as the key), but this is more complex.

Marc Gravell
A: 

Hi Marc,

Thank you! I completely overlooked the final, optional argument to GroupJoin. Also, thanks for your note on comparing floating point numbers.

This is actually my first project using LINQ, so some of it is still a bit confusing (i.e., syntax, terminology, etc.). Would you mind elaborating a little on what you said here?

"You could probably also do something similar by building an expression at runtime for use as the comparer (and just returning the row itself as the key), but this is more complex."

I'm grateful for your help.

-Tyler

Tyler