+2  A: 

The big downside of your current approach is its inefficiency with large search result sets, as you have to pull down the entire result set from the database each time, even though you only intend to display one page of results.

In order to only pull down the objects you actually need from the database, you have to use pagination on a QuerySet, not a list. If you do this, Django actually slices the QuerySet before the query is executed, so the SQL query will use OFFSET and LIMIT to only get the records you will actually display. But you can't do this unless you can cram your search into a single query somehow.

Given that all three of your models have title and body fields, why not use model inheritance? Just have all three models inherit from a common ancestor that has title and body, and perform the search as a single query on the ancestor model.

Carl Meyer
Yeah, I see now that it isnt effecient at all. And I see that the model inheritance may be the way to go, but then I'll have to rewrite my models, and i was hoping to dont have to do that. But it should work.
Espen Christensen
A: 

Generic views don't work in all instances, and in this case, it's probably easier and faster to write your own view that combines the three rather than fight with the framework. Also, if the three models do happen to be be related, consider writing an abstract base class that the three inherit from.

thirtyseven
They do actually dont have any relations other than that they are in the same project/site. I have actually written a view for this to try to make a it work, but that doesnt solve the problem. I still cant merge the querysets into one.
Espen Christensen
+12  A: 

You can use the QuerySetChain class below. When using it with Django's paginator, it should only hit the database with COUNT(*) queries for all querysets and SELECT() queries only for those querysets whose records are displayed on the current page.

Note that you need to specify template_name= if using a QuerySetChain with generic views, even if the chained querysets all use the same model.

from itertools import islice, chain

class QuerySetChain(object):
    """
    Chains multiple subquerysets (possibly of different models) and behaves as
    one queryset.  Supports minimal methods needed for use with
    django.core.paginator.
    """

    def __init__(self, *subquerysets):
        self.querysets = subquerysets

    def count(self):
        """
        Performs a .count() for all subquerysets and returns the number of
        records as an integer.
        """
        return sum(qs.count() for qs in self.querysets)

    def _clone(self):
        "Returns a clone of this queryset chain"
        return self.__class__(*self.querysets)

    def _all(self):
        "Iterates records in all subquerysets"
        return chain(*self.querysets)

    def __getitem__(self, ndx):
        """
        Retrieves an item or slice from the chained set of results from all
        subquerysets.
        """
        if type(ndx) is slice:
            return list(islice(self._all(), ndx.start, ndx.stop, ndx.step or 1))
        else:
            return islice(self._all(), ndx, ndx+1).next()

In your example, the usage would be:

pages = Page.objects.filter(Q(title__icontains=cleaned_search_term) |
                            Q(body__icontains=cleaned_search_term))
articles = Article.objects.filter(Q(title__icontains=cleaned_search_term) |
                                  Q(body__icontains=cleaned_search_term) |
                                  Q(tags__icontains=cleaned_search_term))
posts = Post.objects.filter(Q(title__icontains=cleaned_search_term) |
                            Q(body__icontains=cleaned_search_term) | 
                            Q(tags__icontains=cleaned_search_term))
matches = QuerySetChain(pages, articles, posts)

Then use matches with the paginator like you used result_list in your example.

The itertools module was introduced in Python 2.3, so it should be available in all Python versions Django runs on.

akaihola
Nice approach, but one problem I see here is that the query sets are appended "head-to-tail". What if each queryset is ordered by date and one needs the combined-set to also be ordered by date?
hasen j
This certaintly looks promising, great, I'll have to try that, but i dont have time today. I'll get back to you if it solves my problem. Great work.
Espen Christensen
Ok, I had to try today, but it didnt work, first it complained that it didnt have to _clone attribute so i added that one, just copied the _all and that worked, but it seems that the paginator has some problem with this queryset. I get this paginator error: "len() of unsized object"
Espen Christensen
@Espen It seems that since revision 8121 of Django the paginator always first tries to call the count() method. If your Django is older than that, try renaming count() to __len__().
akaihola
@hasen If sorting is needed, a simple list concatenation works best (if you don't want to do tricks with the database schema). See the "Concatenating the querysets into a list..." answer.
akaihola
Espen Christensen
Espen Christensen
@Espen I added a proper _clone() method to the class. It should now work with the object_list generic view, if template_name= is specified.
akaihola
Espen Christensen
@Espen We hit a difference between Python 2.4 and 2.5. In 2.5 the islice function accepts a step value of None and interprets it as 1. The fix is trivial. By the way, I sense that learning Python debugging techniques plus some Django specific tricks would benefit you a lot. Amazing stuff out there.
akaihola
@akaihola Thanks! You've been most helpful! And yes, i should learn more, definitly, ive just started using Django and Python so there is a lot to learn, but im getting there slowly. Any tips on where to get good resources on debugging? Thanks again!
Espen Christensen
@Espen Python library: pdb, logging. External: IPython, ipdb, django-logging, django-debug-toolbar, django-command-extensions, werkzeug. Use print statements in code or use the logging module. Above all, learn to introspect in the shell. Google for blog posts about debugging Django. Glad to help!
akaihola
+15  A: 

Concatenating the querysets into a list is the simplest approach. If the database will be hit for all querysets anyway (e.g. because the result needs to be sorted), this won't add further cost.

from itertools import chain
result_list = list(chain(page_list, article_list, post_list))

Using itertools.chain is faster than looping each list and appending elements one by one, since itertools is implemented in C. It also consumes less memory than converting each queryset into a list before concatenating.

Now it's possible to sort the resulting list e.g. by date (as requested in hasen j's comment to another answer). The sorted() function conveniently accepts a generator and returns a list:

result_list = sorted(
    chain(page_list, article_list, post_list),
    key=lambda instance: instance.date_created)

If you're using Python 2.4 or later, you can use attrgetter instead of a lambda. I remember reading about it being faster, but I didn't see a noticeable speed difference for a million item list.

from operator import attrgetter
result_list = sorted(
    chain(page_list, article_list, post_list),
    key=attrgetter('date_created'))
akaihola
Drive by voting: finding this post has proved massively useful to me today. Thanks!
Keryn Knight
A: 

here's an idea... just pull down one full page of results from each of the three and then throw out the 20 least useful ones... this eliminates the large querysets and that way you only sacrifice a little performance instead of a lot

Jiaaro
+6  A: 

Try this:

matches = pages | articles | posts

Retains all the functions of the querysets which is nice if you want to order_by or similar.

Oops, please note that this doesn't work on querysets from two different models...

+1  A: 

Looks like t_rybik has created a comprehensive solution at http://www.djangosnippets.org/snippets/1933/

akaihola
A: 

For searching it's better to use dedicated solutions like Haystack - it's very flexible.

minder