views:

60

answers:

2

I need to retrieve 5 objects that match a certain complex criteria, and I can't/don't want to pass that criteria to the WHERE clause(filter in django), so I need to iterate over the results, testing each record for the criteria until I get my 5 objects, after that I want to throw the query set away and never see it again.

In most cases, the records I need will be at the beginning of the query set, in the worst case, it will be at its end. The table is huge and I only need 5 records. So my question is: How do I iterate over a query set without having django to cache the results? This must be done in a way that neither the sql engine/django storing/caching the results anywhere.

+1  A: 

Django does not have a global cache (see ticket #14). This means that as long as you don't hold onto anything, the data will be gone and no longer be cached. At that point, the garbage collector will remove the memory allocation on the next cleanup. Therefore, code such as:

my_objects = [obj for obj in MyModel.objects.all() if my_complex_condition(obj)]

The only caching django would do here is in the particular instance above, and after this line any reference to the cache would be gone. Note that if Django had no cache whatsoever, the memory would still fill up in the same manner, and the GC would collect the rows individually any way.

Mike Axiak
+1  A: 

Why do you worry about caching? Let Django or mysql do what they do.

If you are bent on it. You could disable caching for Django. This is quite simple thing to do in settings.py for your project.

For Mysql, you need to run some querie(s) to disable the query cache -

Try using the SQL_NO_CACHE option in your query. Like so

SELECT SQL_NO_CACHE * FROM TABLE

This will stop MySQL caching the results, however be aware that other OS and disk caches may also impact performance. These are harder to get around.

One problem with this method is that it seems to only prevent the result of your query from being cached. However, if you're querying a database that is actively being used with the query you want to test, then other clients may cache your query, affecting your results. I am continuing to research ways around this, will edit this post if I figure one out.

OR

You could also do RESET QUERY CACHE

OR

FLUSH QUERY CACHE

Although one point to note is that I would suggest letting the Mysql handle the WHERE clause as it has query optimization layer which would be very effective if you have the right fields indexed. Getting all the results & you doing what the WHERE clause does might slow you down depending on the size of the query set. Just some thing to think about. I guess proper benchmarking should show you the way.

MovieYoda
do you know how to disable cache in postgresql?(force it to use a cursor)
Thiado de Arruda
You can get rid of PostgreSQL's caches in shared_buffers by restarting the PostgreSQL server. I don't know if there's any more convenient way.Alternately, just set a really minimal shared_buffers that's just enough for your connections so there's not much room for cached data.
MovieYoda
@Thiado added more stuff yo my answer. Hope this helps. Any reason you are bent on no caching? Some performance testing going on?
MovieYoda
No, I'm just running through a table that may be very big
Thiado de Arruda