ansaurus

Question

Answer 1

+1 A:

Let's think about this. Presumably you want to store (hash of expression tree, result) in a map.

Without storing the whole tree, you can't distinguish between an identical tree and a hash collision.

By definition, hashes map a larger set onto a smaller set (this is why a hash is useful), so by definition, you'll have (at least the possibility of) collisions.

When you get an expression tree, you'll hash it and then go look up a result in your map, which leads to two possibilities:

it's a hash not in the map, one that you haven't seen before. You have to let this one run, as you don't have a cached result.
it's a hash in the map, but since you didn't store the old expression tree that produced the hash in your map, there's no way to compare the newly passed expression to the old one. You may have a match or you may have a collision, but there's no way to distinguish between those two possibilities. You can't return the cached result, because it might be a collision.

Furthermore, even if it's not a collision, even if it is the exact same expression tree as the one you last saw, how do we know that the backing object -- a database, a list, whatever* hasn't had elements added or deleted or modified such that the result returned by the expression might be different than the cached result?

That said, you can hash a tree recursively:

hashATree:
if leaf node
  return hash(node)
else
  return hash(node) *OP* hashATree(left.child) *OP* hashATree(right.child)

where OP is some operation (probably multiplication), or more generally hash(node) *OP* accumulate( children.begin(), children.end(), *OP* );

Of course, this is the same recursion we use to evaluate an expression tree(expect we call node.eval( children);)

tpdi 2009-04-17 01:11:02

To nitpick, hashes don't always go from a big set to a small set, EG perfect hashes.

RossFabricant 2009-04-17 01:29:04

Your point about collisions is theoretically correct but also practically irrelevant. MD5 and SHA have the potential for collisions as well but we still use them because they are practical. I need a practical hash, not a mathematically collision-proof hash.

Rex M 2009-04-17 01:54:11

To answer your question about synchronization problems with the backing store, I have solved those problems. However, I don't feel comfortable going to production with my current method of keying the cache (storing the expression tree itself)

Rex M 2009-04-17 02:07:42

Answer 2

+1 A:

Um, actually I think this might be quite simple.

The ToString() method of an Expression object will give you a textual representation of the Expression, you could hash that if all you want is to evaluate equivalency of a key.

Tim Jarvis 2009-04-17 01:23:09

This doesn't incorporate a full hash of the expression tree...and many expression trees many have the same string value, but different underlying nodes. Do you have any suggestions? I need to accomplish this as well. I'm thinking of making a hashing visitor that specifically knows to include and call GetHashCode on the underlying values of ConstantExpressions. What do you think?

JeffN825 2010-10-07 15:54:01

my understanding was that it was the full text of the expression tree, for example this is what I get with a query with a where clause and a projection.....value(Quest.Intersect.Core.DataHubQueryableTable`1[Quest.Intersect.TestHarness.Customer]).Where(cust => (cust.LastName = "jarvis")).Select(cust => new <>f__AnonymousType1`2(CompID = cust.CompanyID, LastName = cust.LastName))

Tim Jarvis 2010-10-07 23:56:33

As you can see this contains the evaluated constants plus the anonymous class etc. Seems to me that a hash of this would be a good enough key in a dictionary....however, having said that the alternative would as you suggest, be creating a visitor that created a composite hash of each node/leaf

Tim Jarvis 2010-10-07 23:59:02

ansaurus

tags:

views:

answers:

Hashing an expression tree

related questions