views:

733

answers:

1

I'm new to Django, but the application that I have in mind might end up having URLs that look like this:

http://mysite/compare/id_1/id_2

Where "id_1" and "id_2" are identifiers of two distinct Model objects. In the handler for "compare" I'd like to asynchronously, and in parallel, query and retrieve objects id_1 and id_2.

Is there any way to do this using a standard Django syntax? I'm hoping for pseudocode that ends up looking something like this:

import django.async 

# Issue the model query, but set it up asynchronously.  
# The next 2 lines don't actually touch my database 
o1 = Object(id=id_1).async_fetch()
o2 = Object(id=id_2).async_fetch()

# Now that I know what I want to query, fire off a fetch to do them all
# in parallel, and wait for all queries to finish before proceeding. 

async.Execute((o2,o2))

# Now the code can use data from o1 and o2 below...
+6  A: 

There aren't strictly asynchronous operations as you've described, but I think you can achieve the same effect by using django's in_bulk query operator, which takes a list of ids to query.

Something like this for the urls.py:

urlpatterns = patterns('',
    (r'^compare/(\d+)/(\d+)/$', 'my.compareview'),
)

And this for the view:

def compareview(request, id1, id2):
    # in_bulk returns a dict: { obj_id1: <MyModel instance>, 
    #                           obj_id2: <MyModel instance> }
    # the SQL pulls all at once, rather than sequentially... arguably
    # better than async as it pulls in one DB hit, rather than two
    # happening at the same time
    comparables = MyModel.objects.in_bulk([id1, id2])
    o1, o2 = (comparables.get(id1), comparables.get(id2))
Jarret Hardie
Does in_bulk use threads to issues the queries in parallel, or are they still serialized? I'm looking to minimize page render latency.
slacy
In bulk writes a single SQL query, so nothing is either serialized or in parallel... there's just a single DB hit that fetches both instances.
Jarret Hardie
That's unfortunate. On a fast database, it's likely to be faster to issue N queries in parallel than it is to issue one giant one for all the objects. in_bulk() will reduce latency slightly. I'm hoping for an O(1) page render.
slacy
I'm afraid I can't agree, slacy. Having to transfer N responses separately over a DB connection, considering network latency, versus one that sends all the information at once, would rarely be faster. Using one query lets the database optimize the work performed as a whole, unless you have tons of joins or functions involved, which your question does not. To S.Lott's point, are you sure that fetching 2 objects is really the bottleneck in your app? Or even fetching 10? If so, Django (or any ORM) may not be for you... ORMs tend to be chatty if you are concerned with queries at a micro level.
Jarret Hardie
N concurrent queries compete against each other. You have limited connections, limited statement cache, limited data cache and limited access to the underlying files. A single query that fetches multiple rows will (probably) acquire single instances of resources without competing against another similar query.
S.Lott
I guess I'm thinking "too big". I've yet to design my Django app, but I want to make sure I do so properly from the start. My systems design background says "anything you can do in parallel to reduce latency, you should". But, I'm typically working on very large multi-machine systems, when parallel queries (to many machines) is actually faster than doing things serially. I guess on a single machine DB, you guys are right.
slacy
That's a good point. If you do have an infrastructure where you can (and need) to massively parallelize the app and the db, then you would be more concerned with asynchronicity. It all depends on your application, I suppose. Django scales very well, so I've never worried about the performance in that respect, relying instead on strategies and libraries outside the level of the code to distribute work.
Jarret Hardie