views:

248

answers:

2

I am working on building some pseudo-intelligent caching into a LINQ query provider. What I'd like to do (ideally) is use the expression tree of a given query as the cache key in some scenarios. However, I don't want to store the entire object graph itself, so what's a quick way to get a hashsum-like value from an expression tree? Or if I'm going in the wrong direction, is there a better option?

+1  A: 

Let's think about this. Presumably you want to store (hash of expression tree, result) in a map.

Without storing the whole tree, you can't distinguish between an identical tree and a hash collision.

By definition, hashes map a larger set onto a smaller set (this is why a hash is useful), so by definition, you'll have (at least the possibility of) collisions.

When you get an expression tree, you'll hash it and then go look up a result in your map, which leads to two possibilities:

  1. it's a hash not in the map, one that you haven't seen before. You have to let this one run, as you don't have a cached result.

  2. it's a hash in the map, but since you didn't store the old expression tree that produced the hash in your map, there's no way to compare the newly passed expression to the old one. You may have a match or you may have a collision, but there's no way to distinguish between those two possibilities. You can't return the cached result, because it might be a collision.

Furthermore, even if it's not a collision, even if it is the exact same expression tree as the one you last saw, how do we know that the backing object -- a database, a list, whatever* hasn't had elements added or deleted or modified such that the result returned by the expression might be different than the cached result?

That said, you can hash a tree recursively:

hashATree:
if leaf node
  return hash(node)
else
  return hash(node) *OP* hashATree(left.child) *OP* hashATree(right.child)

where OP is some operation (probably multiplication), or more generally hash(node) *OP* accumulate( children.begin(), children.end(), *OP* );

Of course, this is the same recursion we use to evaluate an expression tree(expect we call node.eval( children);)

tpdi
To nitpick, hashes don't always go from a big set to a small set, EG perfect hashes.
RossFabricant
Your point about collisions is theoretically correct but also practically irrelevant. MD5 and SHA have the potential for collisions as well but we still use them because they are practical. I need a practical hash, not a mathematically collision-proof hash.
Rex M
To answer your question about synchronization problems with the backing store, I have solved those problems. However, I don't feel comfortable going to production with my current method of keying the cache (storing the expression tree itself)
Rex M
+1  A: 

Um, actually I think this might be quite simple.

The ToString() method of an Expression object will give you a textual representation of the Expression, you could hash that if all you want is to evaluate equivalency of a key.

Tim Jarvis
This doesn't incorporate a full hash of the expression tree...and many expression trees many have the same string value, but different underlying nodes. Do you have any suggestions? I need to accomplish this as well. I'm thinking of making a hashing visitor that specifically knows to include and call GetHashCode on the underlying values of ConstantExpressions. What do you think?
JeffN825
my understanding was that it was the full text of the expression tree, for example this is what I get with a query with a where clause and a projection.....value(Quest.Intersect.Core.DataHubQueryableTable`1[Quest.Intersect.TestHarness.Customer]).Where(cust => (cust.LastName = "jarvis")).Select(cust => new <>f__AnonymousType1`2(CompID = cust.CompanyID, LastName = cust.LastName))
Tim Jarvis
As you can see this contains the evaluated constants plus the anonymous class etc. Seems to me that a hash of this would be a good enough key in a dictionary....however, having said that the alternative would as you suggest, be creating a visitor that created a composite hash of each node/leaf
Tim Jarvis