views:

925

answers:

2

I have two tables:

block: int id, [other columns...]
blockInstance: int id, int blockId (references Block.Id), [other columns...]

My goal is to generate an enumeration of block objects with two properties: (1) Id (the Id of the Block) and (2) InstanceCount (the number of Instances for the Block). Currently, my XML file contains no Block Instances (that table exists but has zero rows).

Here's my (non-working) query:

var groupedBlocks =
    from
        b in _tableDictionary["block"].Elements()
    join
        bi in _tableDictionary["blockInstance"].Elements()
        on b.Element(_ns + "id").Value equals bi.Element(_ns + "blockId").Value into j
    from
        lj in j.DefaultIfEmpty()
    group
        lj by b.Element(_ns + "id").Value into g
    select new
    {
        Id = g.Key,
        InstanceCount = g.Count(i => i.Element(_ns + "blockId") != null)
    };

The problem is with the predicate (lambda expression) for g.Count(). If I remove the predicate and just use g.Count(), the query generates the proper number of rows, but the InstanceCount for each row is 1, which is wrong (it should be zero). With the predicate in there, the query return zero rows, and if I attempt to view the ResultsView in the debugger, it says "Exception: object reference not set to an instance of an object".

My greatest confusion is what exactly is "i" in my lambda expression. I know it's an XElement, but what does this XElement contain when the data is the result of a join (in this case, a left outer join)?

Okay, well I got one more idea while I was typing this out, and it actually worked :-), but I have no idea why :-(, so I'm still going to ask the question ;-).

If I change the offending code to...

InstanceCount = g.Count(i => i != null)

...it works. But why?! Again, what is getting passed into the lamba expression? And why is it null?

Thanks!

A: 

The parameter passed to the Count lambda is the same type of the one you are grouping. That's because you are calling Count in the context of g which is an IGrouping<string, XElement>.

So, i is XElement.

In your first query you were actually checking for the first child of the grouped XElement, (accessing the i.Element method), that's why it didn't worked.

CMS
+2  A: 

i as referenced in your group count is equivalent to lj, which is returned from enumerating j.DefaultIfEmpty().

DefaultIfEmpty() in this context will return the default value for the type of the item, where the item does not have an output match from the previous join..into statement.

In the case of reference types (XElement is a reference type), the default value is always null, which is why you are getting a null reference exception, and why checking for null first removes the problem.

EDIT:

An alternative method that avoids the grouping by using a sub-query:

var groupedBlocks = 
    from b in _tableDictionary["block"].Elements() 
    select new 
    { 
        Id = b.Element(_ns + "id").Value, 
        InstanceCount = (from bi in _tableDictionary["blockInstance"].Elements() 
                         where bi.Element(_ns + "blockId").Value == 
                               b.Element(_ns + "id").Value
                         select bi).Count()
    };
Sam
@Sam Thank you, Sam. That almost clears it up for me. The one thing I still don't get is, If lj contains only nulls, how was I able to group lj by b.Element(_ns + "id").Value in the first place? In other words, how would the compiler know which item belongs to which key (Block Id) if the enumeration it is working with is {null, null, null...}? It just doesn't make any sense to me.
DanM
You're not just enumerating lj - you're enumerating a group join on b and lj as returned by j.DefaultIfEmpty(). When the group by is executed, it enumerates each item from the group join (which can be thought of as a [b, lj] tuple), extracts the key from b, and adds lj to the grouping that matches that key. When you enumerate the groups later in the projection, you are only enumerating lj, hence the problems with null. Is that any clearer? Group joins are a bit complex...
Sam
@Sam, I like your alternative query. I can make sense of that very easily. As for the original query, I guess I'm now trying to get my head around what the scope of "into" is. Does "into j" mean put the results of the join into j or just put the BlockInstance items into j? Another confusion is, if I don't force the join to be a left join using DefaultEmpty(), it does an inner join (and, in this example, I get zero rows). So, DefaultEmpty() somehow magically *expands* the results set. It's almost like the query is looking ahead to the DefaultEmpty() statement when building j. Thanks again!
DanM
@Sam, BTW, I decided to use your code query instead because it made what I was ultimately trying to do *much* easier and clearer, but I had to add a "select" clause to the end of the sub-query before it would compile. (I used: select bi.Element(_ns + "blockId").) Anyway, works like a charm.
DanM
Whoops, always forget those pesky projections - edited. "into j" is basically saying, defer the join enumeration for now, but use a group join when enumerated. Then when you say "from lj in j.DefaultIfEmpty()", j is enumerated using a special iterator that returns a default value (lj) when there is a value in the outer sequence (b) that does not have a matching sequence of values in the inner sequence (bi).
Sam
@Sam, Much clearer now. Thanks for all your help!
DanM