views:

196

answers:

4

I'm using the Django database models from a process that's not called from an HTTP request. The process is supposed to poll for new data every few seconds and do some processing on it. I have a loop that sleeps for a few seconds and then gets all unhandled data from the database.

What I'm seeing is that after the first fetch, the process never sees any new data. I ran a few tests and it looks like Django is caching results, even though I'm building new QuerySets every time. To verify this, I did this from a Python shell:

>>> MyModel.objects.count()
885
# (Here I added some more data from another process.)
>>> MyModel.objects.count()
885
>>> MyModel.objects.update()
0
>>> MyModel.objects.count()
1025

As you can see, adding new data doesn't change the result count. However, calling the manager's update() method seems to fix the problem.

I can't find any documentation on that update() method and have no idea what other bad things it might do.

My question is, why am I seeing this caching behavior, which contradicts what Django docs say? And how do I prevent it from happening?

A: 

Seems like the count() goes to cache after the first time. This is the django source for QuerySet.count:

def count(self):
    """
    Performs a SELECT COUNT() and returns the number of records as an
    integer.

    If the QuerySet is already fully cached this simply returns the length
    of the cached results set to avoid multiple SELECT COUNT(*) calls.
    """
    if self._result_cache is not None and not self._iter:
        return len(self._result_cache)

    return self.query.get_count(using=self.db)

update does seem to be doing quite a bit of extra work, besides what you need.
But I can't think of any better way to do this, short of writing your own SQL for the count.
If performance is not super important, I would just do what you're doing, calling update before count.

QuerySet.update:

def update(self, **kwargs):
    """
    Updates all elements in the current QuerySet, setting all the given
    fields to the appropriate values.
    """
    assert self.query.can_filter(), \
            "Cannot update a query once a slice has been taken."
    self._for_write = True
    query = self.query.clone(sql.UpdateQuery)
    query.add_update_values(kwargs)
    if not transaction.is_managed(using=self.db):
        transaction.enter_transaction_management(using=self.db)
        forced_managed = True
    else:
        forced_managed = False
    try:
        rows = query.get_compiler(self.db).execute_sql(None)
        if forced_managed:
            transaction.commit(using=self.db)
        else:
            transaction.commit_unless_managed(using=self.db)
    finally:
        if forced_managed:
            transaction.leave_transaction_management(using=self.db)
    self._result_cache = None
    return rows
update.alters_data = True
Infinity
A: 

You can also use MyModel.objects._clone().count(). All of the methods in the the QuerySet call _clone() prior to doing any work - that ensures that any internal caches are invalidated.

The root cause is that MyModel.objects is the same instance each time. By cloning it you're creating a new instance without the cached value. Of course, you can always reach in and invalidate the cache if you'd prefer to use the same instance.

Travis Swicegood
That looks like an awesome and easy solution, but at least on my Django version, it doesn't work. Calling MyModel.objects._clone() results in a "AttributeError: 'Manager' object has no attribute '_clone'". I can do MyModel.objects.all()._clone(), but that works just as before -- doesn't change until I call update(). I'm using Django 1.2.1.
scippy
My bad - it should be `MyModel.objects.all()._clone()`.In thinking about it, you could get away with doing a `MyModel.objects.all().count()` without the `_clone()`. That creates a new version of the base object and should get you a new version w/o the cached value. That is, unless Django's doing something devious there and carrying the state with the clone.
Travis Swicegood
+1  A: 

We've struggled a fair bit with forcing django to refresh the cache. This might not apply to your example, but certainly in django views, by default, there's an implicit call to a transaction, which mysql then isolates from any changes that happen from other processes ater you start.

we used the @transaction.commit_manually decorator and calls to transaction.commit() just before every occasion where you need up-to-date info.

As I say, this definitely applies to views, not sure whether it would apply to django code not being run inside a view.

detailed info here:

http://blog.projectdirigible.com/?p=439

hwjp
A: 

You can clear the cache
from django.core.cache import cache
cache.clear() #This will clear the entire cache
cache.delete('key') #This will delete the cache for 'key'

You can also set a per view cache setting
@cache_page(60 * 15)
def my_view(request):

You can also diable cache for the entire website by putting this in your setting
CACHE_BACKEND = 'dummy://'

Rajani Karuturi