views:

189

answers:

3

I have a model containing ranges of IP addresses, similar to this:

class Country(db.Model):
  begin_ipnum = db.IntegerProperty()
  end_ipnum = db.IntegerProperty()

On a SQL database, I would be able to find rows which contained an IP in a certain range like this:

SELECT * FROM Country WHERE ipnum BETWEEN begin_ipnum AND end_ipnum

or this:

SELECT * FROM Country WHERE begin_ipnum < ipnum AND end_ipnum > ipnum

Sadly, GQL only allows inequality filters on one property, and doesn't support the BETWEEN syntax. How can I work around this and construct a query equivalent to these on App Engine?

Also, can a ListProperty be 'live' or does it have to be computed when the record is created?

question updated with a first stab at a solution:

So based on David's answer below and articles such as these:

http://appengine-cookbook.appspot.com/recipe/custom-model-properties-are-cute/

I'm trying to add a custom field to my model like so:

class IpRangeProperty(db.Property):
  def __init__(self, begin=None, end=None, **kwargs):
    if not isinstance(begin, db.IntegerProperty) or not isinstance(end, db.IntegerProperty):
        raise TypeError('Begin and End must be Integers.')
    self.begin = begin
    self.end = end
    super(IpRangeProperty, self).__init__(self.begin, self.end, **kwargs)

  def get_value_for_datastore(self, model_instance):
    begin = self.begin.get_value_for_datastore(model_instance)
    end = self.end.get_value_for_datastore(model_instance)
    if begin is not None and end is not None:
      return range(begin, end)

class Country(db.Model):
  begin_ipnum = db.IntegerProperty()
  end_ipnum = db.IntegerProperty()
  ip_range = IpRangeProperty(begin=begin_ipnum, end=end_ipnum)

The thinking is that after i add the custom property i can just import my dataset as is and then run queries on based on the ListProperty like so:

q = Country.gql('WHERE ip_range = :1', my_num_ipaddress)

When i try to insert new Country objects this fails though, complaning about not being able to create the name:

...
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/db/__init__.py", line 619, in _attr_name
return '_' + self.name
TypeError: cannot concatenate 'str' and 'IntegerProperty' objects

I tried defining an attr_name method for the new property or just setting self.name but that does not seem to help. Hopelessly stuck or heading in the right direction?

+1  A: 

Short answer: Between queries aren't really supported at the moment. However, if you know a priori that your range is going to be relatively small, then you can fake it: just store a list on the entity with every number in the range. Then you can use a simple equality filter to get entities whose ranges contain a particular value. Obviously this won't work if your range is large. But here's how it would work:

class M(db.Model):
    r = db.ListProperty(int)

# create an instance of M which has a range from `begin` to `end` (inclusive)
M(r=range(begin, end+1)).put()

# query to find instances of M which contain a value `v`
q = M.gql('WHERE r = :1', v)

The better solution (eventually - for now the following only works on the development server due to a bug (see issue 798). In theory, you can work around the limitations you mentioned and perform a range query by taking advantage of how db.ListProperty is queried. The idea is to store both the start and end of your range in a list (in your case, integers representing IP addresses). Then to get entities whose ranges contain some value v (i.e., between the two values in your list), you simply perform a query with two inequality filters on the list - one to ensure that v is at least as big as the smallest element in the list, and one to ensure that v is at least as small as the biggest element in the list.

Here's a simple example of how to implement this technique:

class M(db.Model):
    r = db.ListProperty(int)

# create an instance of M which has a rnage from `begin` to `end` (inclusive)
M(r=[begin, end]).put()

# query to find instances of M which contain a value `v`
q = M.gql('WHERE r >= :1 AND r <= :1', v)
David Underhill
The bug is the way the dev_appserver performs, unfortunately: the correct behaviour is the current production behaviour.
Nick Johnson
This looks like a promising solution, nice and simple. From the docs it seems i could make a custom model property that reports the list value 'live' based on the values of begin_ipnum and end_ipnum. But as far as i can tell i would not be able to query a custom property using GQL. So i'll try a computed ListProperty value for the range on object creation first. Will report back how i fare!
tijs
+2  A: 

My solution doesn't follow the pattern you have requested, but I think it would work well on app engine. I'm using a list of strings of CIDR ranges to define the IP blocks instead of specific begin and end numbers.

from google.appengine.ext import db    
class Country(db.Model):
    subnets = db.StringListProperty()
    country_code = db.StringProperty()

c = Country()
c.subnets = ['1.2.3.0/24', '1.2.0.0/16', '1.3.4.0/24']
c.country_code = 'US'
c.put()

c = Country()
c.subnets = ['2.2.3.0/24', '2.2.0.0/16', '2.3.4.0/24']
c.country_code = 'CA'
c.put()

# Search for 1.2.4.5 starting with most specific block and then expanding until found    
result = Country.all().filter('subnets =', '1.2.4.5/32').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.4/31').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.4/30').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.0/29').fetch(1)
# ... repeat until found
# optimize by starting with the largest routing prefix actually found in your data (probably not 32)
cope360
Neat idea. This might often require many round-trips to the datastore though - which is okay if you don't need to scale, or if this kind of lookup will be infrequent. (Each round-trip will cost at least 60-80ms).
David Underhill
My solution trades storage for speed vs your solution. Assuming we account for all addresses, yours will have about 3B list items.To reduce round-trips in mine you could do an IN query for all possibilities at once.
cope360
An IN query would definitely help cut down on the run time - though this would make for a very expensive query since the IN query would be split into [up to] 30 sub-queries. (And an in-memory longest prefix match on the returned results if you wanted such a match; same with my solution.) That said, though expensive it is possible to execute on large address ranges :). (+1)
David Underhill
This is pretty smart but it's easier for me to wrap my head around David's solution so i'll try that first. The full dataset, ip ranges per country) that i'm using as my starting point is only about 130k entries by the way so it's not too bad.
tijs
@tijs my reference to items is the number IP addresses in all the lists total. You have 130k ranges but each range could contains tens of thousands of addresses. David's solution is much simpler to implement. Good luck!
cope360
A: 

This is odd, because I'm trying to figure out the same basic premise and running into all kinds of problems (and based on the 130K record number, assume you're using MaxMind Geo Country file). Although David's answer would work easy enough, it doesn't work in practice. This from Google:

First, if a query has multiple inequality filters on a given property, an entity will only match the query if it has an individual value for that property that matches all of the inequality filters. For example, if an entity has the values [1, 2] for property x, it will not match the query WHERE x > 1 AND x < 2. Each filter does match one of x's values, but there's no single value that matches both filters.

I batch uploaded the entire data set creating the listproperty field to attempt to no avail.

So any additional help would be greatly appreciated!

Sam
As you can probably tell from my updated question i couldn't get it to work either. I tried some other methods as well but quickly ran into other issues so i abandoned App engine for this project for now.I wasn't sure whether to mark any of the answers as right but since David's seemed to have the most chance of success i marked that one. Afraid i can't help you out though. If you do figure it out please leave a comment here so i can try your solution as well.
tijs
p.s. stackoverflow rules dictate that your question should have been posted as a comment not an answer. Just so you know... :)
tijs