views:

101

answers:

2

In reference to this different but not unrelated question I will borrow the example models.

class Foo(db.Model): bars = db.ListProperty(db.Key)

class Bar(db.Model): pass

If I have a certain Foo entity and I want to get all of the other foo entities also containing a certain bar Key in its bars ListProperty, I would use the following query:

related_foos = Foo.all().filter('bars', bar_entity).fetch(fetch_count)

What about if I want to find all other entities of model kind Foo that have at least N number of matching bar entities? The obvious way to do this with a for-loop would involve drastic inefficiencies, and it might be best to actually change the model itself to make this easier, but it doesn't seem obvious how to do so.

+1  A: 

You can simply apply the same filter repeatedly:

related_foos = Foo.all().filter('bars', bar_entity).filter('bars', bar_entity_2).fetch(fetch_count)

Or, data-driven:

q = Foo.all()
for bar in bar_entities:
  q.filter('bars', bar)
related_foos = q.fetch(fetch_count)

If you don't apply any inequalities or sort orders to the query, the datastore will be able to execute the queries using the built in indexes and the merge join strategy, regardless of how many filters you apply. If you need an inequality or sort order, however, you'll need to have an index for each number of bars you might want to filter on, which leads to exploding indexes (and so is best avoided!)

Nick Johnson
This is a good answer...but does not answer my question. I elaborated in a comment to the question.
jamtoday
You say "I want to get all other Foo entities with at least 2 matching bar keys." - matching what? Do you have a list of keys, and you want to find all entities where the intersection of the two sets contains at least two items?
Nick Johnson
+1  A: 

Given a foo record that has 10 bar_entities and looking for all foo records that have at least 2 of these 10 entities would result in 45 possible equality values 10!/(2!*(10-2)!)=45.

This can be deduced in 10_C_(2-1)=10 reads.

SELECT * from table WHERE bar="1" AND bar in ["2", "3", "4", "5", "6", "7", "8", "9", "0"]
SELECT * from table WHERE bar="2" AND bar in ["3", "4", "5", "6", "7", "8", "9", "0"]
SELECT * from table WHERE bar="3" AND bar in ["4", "5", "6", "7", "8", "9", "0"]
etc.

To reduce this to one read would require that when a foo record is added you populate a separate table that had all the 2 combinations for a given record.

Say you had

foo_table
foo1 [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
foo2 [1, 3, 4]
foo3 [1, 2, a]
foo4 [b, 6, c]

foo_combo_2_table
Parent  Combination
foo1    12
foo1    13
... and all 45 foo1 combinations each in its own row
foo2    13
foo2    14
foo2    34
foo3    12
foo3    1a
foo3    2a
etc.

Now you can do a 

indexes = SELECT __KEY__ from foo_combo_2_table WHERE combination IN [12, 13, 14, 15, ... all 45]
keys = [k.parent() for k in indexes] # you would need to filter for duplicates

This way you wont get into any exploding index issues.

If you also wanted to do any 3 or any 4 entities than for each of these you would need to create a foo_combo_n_table or do a 10_C_(n-1) number of reads.

molicule