views:

662

answers:

1

Im trying to search for some values within a date range for a specific type, but content for dates that exist in the database are not being returned by the query.

Here is an extract of the python code:

deltaDays = timedelta(days= 20)
endDate = datetime.date.today()
startDate = endDate - deltaDays

result = db.GqlQuery(
   "SELECT * FROM myData WHERE mytype = :1 AND pubdate >= :2 and pubdate <= :3", type, startDate, endDate
  )

class myData(db.Model):
   mytype = db.StringProperty(required=True)
   value =  db.FloatProperty(required=True)
   pubdate = db.DateTimeProperty(required=True)

The GQL returns data, but some rows that I am expecting are missing:

 2009-03-18 00:00:00
(missing date in results: 2009-03-20 data exists in database)
 2009-03-23 00:00:00
 2009-03-24 00:00:00
 2009-03-25 00:00:00
 2009-03-26 00:00:00
(missing date in results: 2009-03-27 data exists in database)
 2009-03-30 00:00:00
(missing date in results: 2009-03-31. data exists in database)
 2009-04-01 00:00:00
 2009-04-02 00:00:00
 2009-04-03 00:00:00
 2009-04-06 00:00:00

I uploaded the data via de bulkload script. I just can think of the indexes being corrupted or something similar. This same query used to work for another table i had. But i had to replace it with new content from another source, and this new content is not responding to the query in the same way. The table has around 700.000 rows if that makes any difference.

I have done more research ant it appears that its a bug in the appEngine DataStore. For more information about the bug check this link: http://code.google.com/p/googleappengine/issues/detail?id=901

I have tried droping the index and recreating it with no luck.

thanks

+1  A: 

nothing looks wrong to me. are you sure that the missing dates also have mytype == type?

i have observed some funny behaviour with indexes in the past. I recommend writing a handler to iterate through all of your records and just put() them back in the database. maybe something with the bulk uploader isn't working properly.

Here's the type of handler I use to iterate through all the entities in a model class:

 class PPIterator(BaseRequestHandler):
  def get(self):
    query = Model.gql('ORDER BY __key__')
    last_key_str = self.request.get('last')
    if last_key_str:
      last_key = db.Key(last_key_str)
      query = Model.gql('WHERE __key__ > :1 ORDER BY __key__', last_key)
    entities = query.fetch(11)
    new_last_key_str = None
    if len(entities) == 11:
      new_last_key_str = str(entities[9].key())
    for e in entities:
      e.put()
    if new_last_key_str:
      self.response.out.write(json.write(new_last_key_str))
    else:
      self.response.out.write(json.write('done'))

You can use whatever you want to iterate through the entities. I used to use Javascript in a browser window, but found that was a pig when making hundreds of thousands of requests. These days I find it more convenient to use a ruby script like this one:

require 'net/http'
require 'json'
last=nil
while last != 'done'
  url = 'your_url'
  path = '/your_path'
  path += "?/last=#{last}" if last
  last = Net::HTTP.get(url,path)
  puts last
end

Ben

UPDATE: now that remote api is working and reliable, I rarely write this type of handler anymore. The same ideas apply to the code you'd use there to iterate through the entities in the remote api console.

mainsocial
Yes, the existing data has the same type as the one in the query. I tested it while printing thre results to the log, and it is correct.
Benjamin Ortuzar
I have updated the question with more information.
Benjamin Ortuzar
Fantastic script, i will give it a try tonight. Do you have the HTML that grabs the new_last_key_str from the json and put's it back in the url for a refresh?
Benjamin Ortuzar