tags:

views:

202

answers:

5

I have a Django app where most of the search is driven by foreign keys. For example, assuming Student, School, State, and EducationalQualification are Django models, the user would search for Students by specifying search criteria by selecting from lists of Schools, States, Degrees, Diplomas etc. That is, a search on students is essentially an answer to the question "Show students that belong to the following schools, and who belong to the following states, and who have the following degrees / diplomas".

My Django app is purely database driven - there are no documents or webpages to search.

In this case where searching for Django models are guided mostly by the foreign keys that model has, what search apps/solutions are most appropriate? The ones I have taken a look at all talk a lot about full text search, I may be wrong but I don't think that is appropriate in my case.

EDIT: - I am currently searching using Peter Herndon's approach (http://www.slideshare.net/tpherndon/django-search-presentation). But this is expected to be a high-traffic site and I am worried about speed and performance.

+2  A: 

if your djangoapp is purely database-driver, will be practical for you to do a search aplication with complex-lookups-with-q-objects because making-queries (good implemented) is a efective way to lookup data on db guided by FKs.

panchicore
That is exactly what I am doing right now. On the other hand, I am concerned because this is going to be a high-traffic site.
chefsmart
A: 

If you are going to have high traffic, have you thought about basing your search on Solr/Haystack?

Haystack makes setting up searches based on Solr (Lucene) very simple and you basically don't have to modify your models.

http://haystacksearch.org/

You can even start with Woosh (a non-production search engine for Haystack) and then later add on Solr.

celopes
Lucene as well as Solr seem to be focused on full-text search. I am not sure this is what I need, but I have been reading docs to understand how this fits my situation.
chefsmart
A: 

django-filter is a reusable Django application for allowing users to filter queryset dynamically.

try it :)

panchicore
I'd second django-filter, of course I'm also the author :)
Alex Gaynor
+1 nice contrubution ;)
panchicore
A: 

If you are doing such specific field searching, which it seems you are, then advanced queries are the answer. Wait until you actually have performance problems before you worry too much about them. Getting lots of traffic is a much harder problem to solve than dealing with lots of traffic. If you do need to scale this, then remember that hardware is cheap. Just get a really sweet database server that is configured well, and have big-time caching.

You could try to use tools like sphinx, haystack, etc. but those are designed to handle Google-style searches, and not the really specific queries that you are talking about.

Apreche
"Getting lots of traffic is a much harder problem", but traffic already exists here as this is an app that is going to be used by a consortium of educators and companies nationwide. I am sticking with Q objects for now.
chefsmart
A: 

To clarify on celopes answer, the way django-haystack works is by allowing you to define a rendered "document" for each model.

So say you have some models...

class Teacher(mdoels.Model):
    name = models.CharFiel(max_length=100)

class Course(models.Model):
    name = models.CharField(max_length=100)
    teacher = models.ForeignKey(Teacher)

class Student(models.Model):
    name = models.CharFiel(max_length=100)
    grade = models.IntegerField()
    classes = models.ManyToManyField(Course, related_name='students')

class Grade(models.Model):
    value = models.CharField(max_length=1)
    course = models.ForeignKey(Course)
    student = models.ForeignKey(Student, related_name='grades')

In haystack, you'd define a template to render Course...

{% comment %} In this context 'object' represents a Course model {% endcomment %}
<h1>{{ object.name }}</h1>
<h2>{{ object.teacher.name }}</h2>
<ul>
{% for student in object.students %}
    <li>{{ student.name }}</li>
{% endfor %}
</ul>

In this way you're sort of defining not only a 'document' that represents each Course model, but you're also specifying a priority for pieces of information based on the HTML markup (h1 is more important than h2 which is more important than li).

In terms of overhead these 'documents' are rendered by using the haystack management command...

\> manage.py reindex

In this way you could seutp a cron/Scheduler job to reindex at whatever interval you were comfortable with.

Solr also includes some neat things like spelling suggestion, and all those neat things. I initially tried Whoosh with haystack but was dis pointed by it doing funny things with queirest containing a hyphen. Haystack+Solr is a nice combo.

T. Stone